You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Silviu Calinoiu <si...@google.com.INVALID> on 2016/06/03 13:13:04 UTC

Apache Beam for Python

Hi all,

My name is Silviu Calinoiu and I am a member of the Cloud Dataflow team
working on the Python SDK.  As the original Beam proposal (
https://wiki.apache.org/incubator/BeamProposal) mentioned, we have been
planning to merge the Python SDK into Beam. The Python SDK is in an early
stage of development (alpha milestone) and so this is a good time to move
the code without causing too much disruption to our customers.
Additionally, this enables the Beam community to contribute as soon as
possible.

The current state of the SDK is as follows:

   -

   Open-sourced at https://github.com/GoogleCloudPlatform/DataflowPythonSDK/


   -

   Model: All main concepts are present.
   -

   I/O: SDK supports text (Google Cloud Storage) and BigQuery connectors
   and has a framework for adding additional sources and sinks.
   -

   Runners: SDK has two pipeline runners: direct runner (in process, local
   execution) and Cloud Dataflow runner for batch pipelines (submit job to
   Google Dataflow service). The current direct runner is bounded only (batch
   execution) but there is work in progress to support unbounded (as in Java).
   -

   Testing: The code base has unit test coverage for all the modules and
   several integration and end to end tests (similar in coverage to the Java
   SDK). Streaming is not well tested end to end yet since Cloud Dataflow
   focused first on batch.
   -

   Docs: We have matching Python documentation for the features currently
   supported by Cloud Dataflow. The docs are on cloud.google.com (access
   only by whitelist due to the alpha stage of the project). Devin is working
   on the transition of all docs to Apache.


In the next days/weeks we would like to prepare and start migrating the
code and you should start seeing some pull requests. We also hope that the
Beam community will shape the SDK going forward. In particular, all the
model improvements implemented for Java (Runner API, etc.) will have
equivalents in Python once they stabilize. If you have any advice before we
start the journey please let us know.

The team that will join the Beam effort consists of me (Silviu Calinoiu),
Charles Chen, Ahmet Altay, Chamikara Jayalath, and last but not least
Robert Bradshaw (who is already an Apache Beam committer).

So let us know what you think!

Best regards,

Silviu

Re: Apache Beam for Python

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
I'm more proposing just a folder containing Pythong SDK, not necessary 
part of the Maven reactor.

Regards
JB

On 06/03/2016 03:34 PM, Silviu Calinoiu wrote:
> Hi JB,
> Thanks for the welcome! I come from the Python land so  I am not quite
> familiar with Maven. What do you mean by a Maven module? You mean an
> artifact so you can install things? In Python, people are used to packages
> downloaded from PyPI (pypi.python.org -- which is sort of Maven for
> Python). Whatever is the standard way of doing things in Apache we'll do
> it. Just asking for clarifications.
>
> By the way this discussion is very useful since we will have to iron out
> several details like this.
> Thanks,
> Silviu
>
> On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofr� <jb...@nanthrax.net>
> wrote:
>
>> Hi Silviu,
>>
>> thanks for detailed update and great work !
>>
>> I would advice to create a:
>>
>> sdks/python
>>
>> Maven module to store the Python SDK.
>>
>> WDYT ?
>>
>> By the way, welcome aboard and great to have you all guys in the team !
>>
>> Regards
>> JB
>>
>> On 06/03/2016 03:13 PM, Silviu Calinoiu wrote:
>>
>>> Hi all,
>>>
>>> My name is Silviu Calinoiu and I am a member of the Cloud Dataflow team
>>> working on the Python SDK.  As the original Beam proposal (
>>> https://wiki.apache.org/incubator/BeamProposal) mentioned, we have been
>>> planning to merge the Python SDK into Beam. The Python SDK is in an early
>>> stage of development (alpha milestone) and so this is a good time to move
>>> the code without causing too much disruption to our customers.
>>> Additionally, this enables the Beam community to contribute as soon as
>>> possible.
>>>
>>> The current state of the SDK is as follows:
>>>
>>>      -
>>>
>>>      Open-sourced at
>>> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/
>>>
>>>
>>>      -
>>>
>>>      Model: All main concepts are present.
>>>      -
>>>
>>>      I/O: SDK supports text (Google Cloud Storage) and BigQuery connectors
>>>      and has a framework for adding additional sources and sinks.
>>>      -
>>>
>>>      Runners: SDK has two pipeline runners: direct runner (in process,
>>> local
>>>      execution) and Cloud Dataflow runner for batch pipelines (submit job
>>> to
>>>      Google Dataflow service). The current direct runner is bounded only
>>> (batch
>>>      execution) but there is work in progress to support unbounded (as in
>>> Java).
>>>      -
>>>
>>>      Testing: The code base has unit test coverage for all the modules and
>>>      several integration and end to end tests (similar in coverage to the
>>> Java
>>>      SDK). Streaming is not well tested end to end yet since Cloud Dataflow
>>>      focused first on batch.
>>>      -
>>>
>>>      Docs: We have matching Python documentation for the features currently
>>>      supported by Cloud Dataflow. The docs are on cloud.google.com (access
>>>      only by whitelist due to the alpha stage of the project). Devin is
>>> working
>>>      on the transition of all docs to Apache.
>>>
>>>
>>> In the next days/weeks we would like to prepare and start migrating the
>>> code and you should start seeing some pull requests. We also hope that the
>>> Beam community will shape the SDK going forward. In particular, all the
>>> model improvements implemented for Java (Runner API, etc.) will have
>>> equivalents in Python once they stabilize. If you have any advice before
>>> we
>>> start the journey please let us know.
>>>
>>> The team that will join the Beam effort consists of me (Silviu Calinoiu),
>>> Charles Chen, Ahmet Altay, Chamikara Jayalath, and last but not least
>>> Robert Bradshaw (who is already an Apache Beam committer).
>>>
>>> So let us know what you think!
>>>
>>> Best regards,
>>>
>>> Silviu
>>>
>>>
>> --
>> Jean-Baptiste Onofr�
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Apache Beam for Python

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Awesome !

Thanks !
Regards
JB

On 06/03/2016 04:02 PM, Silviu Calinoiu wrote:
> Hi James,
> Yes we fit right into sdks/python. I will have out a doc/proposal about
> where things go so people can comment. It will follow closely the Beam
> repository guidelines.
> Thanks,
> Silviu
>
> On Fri, Jun 3, 2016 at 6:51 AM, James Malone <jamesmalone@google.com.invalid
>> wrote:
>
>> Hey Silviu!
>>
>> I think JB is proposing we create a python directory in the sdks directory
>> in the root repository (and modify the configuration files accordingly):
>>
>>     https://github.com/apache/incubator-beam/tree/master/sdks
>>
>> This Beam document here titled "Apache Beam (Incubating): Repository
>> Structure" details the proposed repository structure and may be useful:
>>
>>
>>
>> https://drive.google.com/a/google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc
>>
>> Best,
>>
>> James
>>
>>
>>
>> On Fri, Jun 3, 2016 at 6:34 AM, Silviu Calinoiu <silviuc@google.com.invalid
>>>
>> wrote:
>>
>>> Hi JB,
>>> Thanks for the welcome! I come from the Python land so  I am not quite
>>> familiar with Maven. What do you mean by a Maven module? You mean an
>>> artifact so you can install things? In Python, people are used to
>> packages
>>> downloaded from PyPI (pypi.python.org -- which is sort of Maven for
>>> Python). Whatever is the standard way of doing things in Apache we'll do
>>> it. Just asking for clarifications.
>>>
>>> By the way this discussion is very useful since we will have to iron out
>>> several details like this.
>>> Thanks,
>>> Silviu
>>>
>>> On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofr� <jb...@nanthrax.net>
>>> wrote:
>>>
>>>> Hi Silviu,
>>>>
>>>> thanks for detailed update and great work !
>>>>
>>>> I would advice to create a:
>>>>
>>>> sdks/python
>>>>
>>>> Maven module to store the Python SDK.
>>>>
>>>> WDYT ?
>>>>
>>>> By the way, welcome aboard and great to have you all guys in the team !
>>>>
>>>> Regards
>>>> JB
>>>>
>>>> On 06/03/2016 03:13 PM, Silviu Calinoiu wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> My name is Silviu Calinoiu and I am a member of the Cloud Dataflow
>> team
>>>>> working on the Python SDK.  As the original Beam proposal (
>>>>> https://wiki.apache.org/incubator/BeamProposal) mentioned, we have
>> been
>>>>> planning to merge the Python SDK into Beam. The Python SDK is in an
>>> early
>>>>> stage of development (alpha milestone) and so this is a good time to
>>> move
>>>>> the code without causing too much disruption to our customers.
>>>>> Additionally, this enables the Beam community to contribute as soon as
>>>>> possible.
>>>>>
>>>>> The current state of the SDK is as follows:
>>>>>
>>>>>      -
>>>>>
>>>>>      Open-sourced at
>>>>> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/
>>>>>
>>>>>
>>>>>      -
>>>>>
>>>>>      Model: All main concepts are present.
>>>>>      -
>>>>>
>>>>>      I/O: SDK supports text (Google Cloud Storage) and BigQuery
>>> connectors
>>>>>      and has a framework for adding additional sources and sinks.
>>>>>      -
>>>>>
>>>>>      Runners: SDK has two pipeline runners: direct runner (in process,
>>>>> local
>>>>>      execution) and Cloud Dataflow runner for batch pipelines (submit
>> job
>>>>> to
>>>>>      Google Dataflow service). The current direct runner is bounded
>> only
>>>>> (batch
>>>>>      execution) but there is work in progress to support unbounded (as
>> in
>>>>> Java).
>>>>>      -
>>>>>
>>>>>      Testing: The code base has unit test coverage for all the modules
>>> and
>>>>>      several integration and end to end tests (similar in coverage to
>> the
>>>>> Java
>>>>>      SDK). Streaming is not well tested end to end yet since Cloud
>>> Dataflow
>>>>>      focused first on batch.
>>>>>      -
>>>>>
>>>>>      Docs: We have matching Python documentation for the features
>>> currently
>>>>>      supported by Cloud Dataflow. The docs are on cloud.google.com
>>> (access
>>>>>      only by whitelist due to the alpha stage of the project). Devin is
>>>>> working
>>>>>      on the transition of all docs to Apache.
>>>>>
>>>>>
>>>>> In the next days/weeks we would like to prepare and start migrating
>> the
>>>>> code and you should start seeing some pull requests. We also hope that
>>> the
>>>>> Beam community will shape the SDK going forward. In particular, all
>> the
>>>>> model improvements implemented for Java (Runner API, etc.) will have
>>>>> equivalents in Python once they stabilize. If you have any advice
>> before
>>>>> we
>>>>> start the journey please let us know.
>>>>>
>>>>> The team that will join the Beam effort consists of me (Silviu
>>> Calinoiu),
>>>>> Charles Chen, Ahmet Altay, Chamikara Jayalath, and last but not least
>>>>> Robert Bradshaw (who is already an Apache Beam committer).
>>>>>
>>>>> So let us know what you think!
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Silviu
>>>>>
>>>>>
>>>> --
>>>> Jean-Baptiste Onofr�
>>>> jbonofre@apache.org
>>>> http://blog.nanthrax.net
>>>> Talend - http://www.talend.com
>>>>
>>>
>>
>

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Apache Beam for Python

Posted by Silviu Calinoiu <si...@google.com.INVALID>.
Hi James,
Yes we fit right into sdks/python. I will have out a doc/proposal about
where things go so people can comment. It will follow closely the Beam
repository guidelines.
Thanks,
Silviu

On Fri, Jun 3, 2016 at 6:51 AM, James Malone <jamesmalone@google.com.invalid
> wrote:

> Hey Silviu!
>
> I think JB is proposing we create a python directory in the sdks directory
> in the root repository (and modify the configuration files accordingly):
>
>    https://github.com/apache/incubator-beam/tree/master/sdks
>
> This Beam document here titled "Apache Beam (Incubating): Repository
> Structure" details the proposed repository structure and may be useful:
>
>
>
> https://drive.google.com/a/google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc
>
> Best,
>
> James
>
>
>
> On Fri, Jun 3, 2016 at 6:34 AM, Silviu Calinoiu <silviuc@google.com.invalid
> >
> wrote:
>
> > Hi JB,
> > Thanks for the welcome! I come from the Python land so  I am not quite
> > familiar with Maven. What do you mean by a Maven module? You mean an
> > artifact so you can install things? In Python, people are used to
> packages
> > downloaded from PyPI (pypi.python.org -- which is sort of Maven for
> > Python). Whatever is the standard way of doing things in Apache we'll do
> > it. Just asking for clarifications.
> >
> > By the way this discussion is very useful since we will have to iron out
> > several details like this.
> > Thanks,
> > Silviu
> >
> > On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> > wrote:
> >
> > > Hi Silviu,
> > >
> > > thanks for detailed update and great work !
> > >
> > > I would advice to create a:
> > >
> > > sdks/python
> > >
> > > Maven module to store the Python SDK.
> > >
> > > WDYT ?
> > >
> > > By the way, welcome aboard and great to have you all guys in the team !
> > >
> > > Regards
> > > JB
> > >
> > > On 06/03/2016 03:13 PM, Silviu Calinoiu wrote:
> > >
> > >> Hi all,
> > >>
> > >> My name is Silviu Calinoiu and I am a member of the Cloud Dataflow
> team
> > >> working on the Python SDK.  As the original Beam proposal (
> > >> https://wiki.apache.org/incubator/BeamProposal) mentioned, we have
> been
> > >> planning to merge the Python SDK into Beam. The Python SDK is in an
> > early
> > >> stage of development (alpha milestone) and so this is a good time to
> > move
> > >> the code without causing too much disruption to our customers.
> > >> Additionally, this enables the Beam community to contribute as soon as
> > >> possible.
> > >>
> > >> The current state of the SDK is as follows:
> > >>
> > >>     -
> > >>
> > >>     Open-sourced at
> > >> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/
> > >>
> > >>
> > >>     -
> > >>
> > >>     Model: All main concepts are present.
> > >>     -
> > >>
> > >>     I/O: SDK supports text (Google Cloud Storage) and BigQuery
> > connectors
> > >>     and has a framework for adding additional sources and sinks.
> > >>     -
> > >>
> > >>     Runners: SDK has two pipeline runners: direct runner (in process,
> > >> local
> > >>     execution) and Cloud Dataflow runner for batch pipelines (submit
> job
> > >> to
> > >>     Google Dataflow service). The current direct runner is bounded
> only
> > >> (batch
> > >>     execution) but there is work in progress to support unbounded (as
> in
> > >> Java).
> > >>     -
> > >>
> > >>     Testing: The code base has unit test coverage for all the modules
> > and
> > >>     several integration and end to end tests (similar in coverage to
> the
> > >> Java
> > >>     SDK). Streaming is not well tested end to end yet since Cloud
> > Dataflow
> > >>     focused first on batch.
> > >>     -
> > >>
> > >>     Docs: We have matching Python documentation for the features
> > currently
> > >>     supported by Cloud Dataflow. The docs are on cloud.google.com
> > (access
> > >>     only by whitelist due to the alpha stage of the project). Devin is
> > >> working
> > >>     on the transition of all docs to Apache.
> > >>
> > >>
> > >> In the next days/weeks we would like to prepare and start migrating
> the
> > >> code and you should start seeing some pull requests. We also hope that
> > the
> > >> Beam community will shape the SDK going forward. In particular, all
> the
> > >> model improvements implemented for Java (Runner API, etc.) will have
> > >> equivalents in Python once they stabilize. If you have any advice
> before
> > >> we
> > >> start the journey please let us know.
> > >>
> > >> The team that will join the Beam effort consists of me (Silviu
> > Calinoiu),
> > >> Charles Chen, Ahmet Altay, Chamikara Jayalath, and last but not least
> > >> Robert Bradshaw (who is already an Apache Beam committer).
> > >>
> > >> So let us know what you think!
> > >>
> > >> Best regards,
> > >>
> > >> Silviu
> > >>
> > >>
> > > --
> > > Jean-Baptiste Onofré
> > > jbonofre@apache.org
> > > http://blog.nanthrax.net
> > > Talend - http://www.talend.com
> > >
> >
>

Re: Apache Beam for Python

Posted by Robert Bradshaw <ro...@google.com.INVALID>.
Woo hoo!

On Tue, Jun 14, 2016 at 12:41 PM, Jean-Baptiste Onofré <jb...@nanthrax.net> wrote:
> Awesome ! Thanks !
>
> Agree with Davor to create a feature branch.
>
> Regards
> JB
>
>
> On 06/14/2016 09:22 PM, Silviu Calinoiu wrote:
>>
>> Thanks everybody for the welcoming and feedback. The initial code move was
>> proposed as pull request #461 [1].
>>
>> Looking forward to working with everybody in the Beam community and
>> especially any Pythonistas out there.
>>
>> Thanks,
>> Silviu
>>
>> [1] https://github.com/apache/incubator-beam/pull/461
>>
>> On Sat, Jun 4, 2016 at 12:35 AM, Ismaël Mejía <ie...@gmail.com> wrote:
>>
>>> Excellent guys, Welcome to Beam !
>>>
>>> I am looking for ways to integrate Beam with the standard notebook tools
>>> (Zẽppelin / Jupyter [ipython], so I am really happy to see the python SDK
>>> arriving to Beam, Awesome.
>>>
>>> Ismaël Mejía
>>>
>>> On Fri, Jun 3, 2016 at 7:17 PM, Amit Sela <am...@gmail.com> wrote:
>>>
>>>> Welcome Python people ;)
>>>>
>>>> I know a few people who've been waiting for this one!
>>>>
>>>> On Fri, Jun 3, 2016, 19:53 Davor Bonaci <da...@google.com.invalid>
>>>
>>> wrote:
>>>>
>>>>
>>>>> Welcome Python SDK, as well as Silviu, Charles, Ahmet and Chamikara!
>>>>>
>>>>> On Fri, Jun 3, 2016 at 7:07 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
>>>>> wrote:
>>>>>
>>>>>> Absolutely ;)
>>>>>>
>>>>>>
>>>>>> On 06/03/2016 03:51 PM, James Malone wrote:
>>>>>>
>>>>>>> Hey Silviu!
>>>>>>>
>>>>>>> I think JB is proposing we create a python directory in the sdks
>>>>>
>>>>> directory
>>>>>>>
>>>>>>> in the root repository (and modify the configuration files
>>>>
>>>> accordingly):
>>>>>>>
>>>>>>>
>>>>>>>      https://github.com/apache/incubator-beam/tree/master/sdks
>>>>>>>
>>>>>>> This Beam document here titled "Apache Beam (Incubating): Repository
>>>>>>> Structure" details the proposed repository structure and may be
>>>>
>>>> useful:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>> https://drive.google.com/a/google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc
>>>>>>>
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> James
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 3, 2016 at 6:34 AM, Silviu Calinoiu
>>>>>>> <si...@google.com.invalid>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi JB,
>>>>>>>>
>>>>>>>> Thanks for the welcome! I come from the Python land so  I am not
>>>>
>>>> quite
>>>>>>>>
>>>>>>>> familiar with Maven. What do you mean by a Maven module? You mean
>>>
>>> an
>>>>>>>>
>>>>>>>> artifact so you can install things? In Python, people are used to
>>>>>>>> packages
>>>>>>>> downloaded from PyPI (pypi.python.org -- which is sort of Maven
>>>
>>> for
>>>>>>>>
>>>>>>>> Python). Whatever is the standard way of doing things in Apache
>>>
>>> we'll
>>>>>
>>>>> do
>>>>>>>>
>>>>>>>> it. Just asking for clarifications.
>>>>>>>>
>>>>>>>> By the way this discussion is very useful since we will have to
>>>
>>> iron
>>>>>
>>>>> out
>>>>>>>>
>>>>>>>> several details like this.
>>>>>>>> Thanks,
>>>>>>>> Silviu
>>>>>>>>
>>>>>>>> On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofré <
>>>>
>>>> jb@nanthrax.net>
>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Silviu,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> thanks for detailed update and great work !
>>>>>>>>>
>>>>>>>>> I would advice to create a:
>>>>>>>>>
>>>>>>>>> sdks/python
>>>>>>>>>
>>>>>>>>> Maven module to store the Python SDK.
>>>>>>>>>
>>>>>>>>> WDYT ?
>>>>>>>>>
>>>>>>>>> By the way, welcome aboard and great to have you all guys in the
>>>>
>>>> team
>>>>>
>>>>> !
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> JB
>>>>>>>>>
>>>>>>>>> On 06/03/2016 03:13 PM, Silviu Calinoiu wrote:
>>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> My name is Silviu Calinoiu and I am a member of the Cloud
>>>
>>> Dataflow
>>>>>
>>>>> team
>>>>>>>>>>
>>>>>>>>>> working on the Python SDK.  As the original Beam proposal (
>>>>>>>>>> https://wiki.apache.org/incubator/BeamProposal) mentioned, we
>>>
>>> have
>>>>>>>>>>
>>>>>>>>>> been
>>>>>>>>>> planning to merge the Python SDK into Beam. The Python SDK is in
>>>
>>> an
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> early
>>>>>>>>
>>>>>>>>
>>>>>>>>> stage of development (alpha milestone) and so this is a good time
>>>
>>> to
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> move
>>>>>>>>
>>>>>>>>
>>>>>>>>> the code without causing too much disruption to our customers.
>>>>>>>>>>
>>>>>>>>>> Additionally, this enables the Beam community to contribute as
>>>
>>> soon
>>>>>
>>>>> as
>>>>>>>>>>
>>>>>>>>>> possible.
>>>>>>>>>>
>>>>>>>>>> The current state of the SDK is as follows:
>>>>>>>>>>
>>>>>>>>>>       -
>>>>>>>>>>
>>>>>>>>>>       Open-sourced at
>>>>>>>>>> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>       -
>>>>>>>>>>
>>>>>>>>>>       Model: All main concepts are present.
>>>>>>>>>>       -
>>>>>>>>>>
>>>>>>>>>>       I/O: SDK supports text (Google Cloud Storage) and BigQuery
>>>>>>>>>>
>>>>>>>>> connectors
>>>>>>>>
>>>>>>>>
>>>>>>>>>       and has a framework for adding additional sources and sinks.
>>>>>>>>>>
>>>>>>>>>>       -
>>>>>>>>>>
>>>>>>>>>>       Runners: SDK has two pipeline runners: direct runner (in
>>>>>
>>>>> process,
>>>>>>>>>>
>>>>>>>>>> local
>>>>>>>>>>       execution) and Cloud Dataflow runner for batch pipelines
>>>>
>>>> (submit
>>>>>>>>>>
>>>>>>>>>> job
>>>>>>>>>> to
>>>>>>>>>>       Google Dataflow service). The current direct runner is
>>>
>>> bounded
>>>>>>>>>>
>>>>>>>>>> only
>>>>>>>>>> (batch
>>>>>>>>>>       execution) but there is work in progress to support
>>>
>>> unbounded
>>>>>
>>>>> (as
>>>>>>>>>>
>>>>>>>>>> in
>>>>>>>>>> Java).
>>>>>>>>>>       -
>>>>>>>>>>
>>>>>>>>>>       Testing: The code base has unit test coverage for all the
>>>>>
>>>>> modules
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> and
>>>>>>>>
>>>>>>>>
>>>>>>>>>       several integration and end to end tests (similar in coverage
>>>>
>>>> to
>>>>>>>>>>
>>>>>>>>>> the
>>>>>>>>>> Java
>>>>>>>>>>       SDK). Streaming is not well tested end to end yet since
>>>
>>> Cloud
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Dataflow
>>>>>>>>
>>>>>>>>
>>>>>>>>>       focused first on batch.
>>>>>>>>>>
>>>>>>>>>>       -
>>>>>>>>>>
>>>>>>>>>>       Docs: We have matching Python documentation for the features
>>>>>>>>>>
>>>>>>>>> currently
>>>>>>>>
>>>>>>>>
>>>>>>>>>       supported by Cloud Dataflow. The docs are on
>>>
>>> cloud.google.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> (access
>>>>>>>>
>>>>>>>>
>>>>>>>>>       only by whitelist due to the alpha stage of the project).
>>>
>>> Devin
>>>>>
>>>>> is
>>>>>>>>>>
>>>>>>>>>> working
>>>>>>>>>>       on the transition of all docs to Apache.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> In the next days/weeks we would like to prepare and start
>>>
>>> migrating
>>>>>
>>>>> the
>>>>>>>>>>
>>>>>>>>>> code and you should start seeing some pull requests. We also hope
>>>>>
>>>>> that
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> the
>>>>>>>>
>>>>>>>>
>>>>>>>>> Beam community will shape the SDK going forward. In particular,
>>>
>>> all
>>>>>
>>>>> the
>>>>>>>>>>
>>>>>>>>>> model improvements implemented for Java (Runner API, etc.) will
>>>>
>>>> have
>>>>>>>>>>
>>>>>>>>>> equivalents in Python once they stabilize. If you have any advice
>>>>>>>>>> before
>>>>>>>>>> we
>>>>>>>>>> start the journey please let us know.
>>>>>>>>>>
>>>>>>>>>> The team that will join the Beam effort consists of me (Silviu
>>>>>>>>>>
>>>>>>>>> Calinoiu),
>>>>>>>>
>>>>>>>>
>>>>>>>>> Charles Chen, Ahmet Altay, Chamikara Jayalath, and last but not
>>>>
>>>> least
>>>>>>>>>>
>>>>>>>>>> Robert Bradshaw (who is already an Apache Beam committer).
>>>>>>>>>>
>>>>>>>>>> So let us know what you think!
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>>
>>>>>>>>>> Silviu
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Jean-Baptiste Onofré
>>>>>>>>> jbonofre@apache.org
>>>>>>>>> http://blog.nanthrax.net
>>>>>>>>> Talend - http://www.talend.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> Jean-Baptiste Onofré
>>>>>> jbonofre@apache.org
>>>>>> http://blog.nanthrax.net
>>>>>> Talend - http://www.talend.com
>>>>>>
>>>>>
>>>>
>>>
>>
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com

Re: Apache Beam for Python

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Awesome ! Thanks !

Agree with Davor to create a feature branch.

Regards
JB

On 06/14/2016 09:22 PM, Silviu Calinoiu wrote:
> Thanks everybody for the welcoming and feedback. The initial code move was
> proposed as pull request #461 [1].
>
> Looking forward to working with everybody in the Beam community and
> especially any Pythonistas out there.
>
> Thanks,
> Silviu
>
> [1] https://github.com/apache/incubator-beam/pull/461
>
> On Sat, Jun 4, 2016 at 12:35 AM, Isma�l Mej�a <ie...@gmail.com> wrote:
>
>> Excellent guys, Welcome to Beam !
>>
>> I am looking for ways to integrate Beam with the standard notebook tools
>> (Z\u1ebdppelin / Jupyter [ipython], so I am really happy to see the python SDK
>> arriving to Beam, Awesome.
>>
>> Isma�l Mej�a
>>
>> On Fri, Jun 3, 2016 at 7:17 PM, Amit Sela <am...@gmail.com> wrote:
>>
>>> Welcome Python people ;)
>>>
>>> I know a few people who've been waiting for this one!
>>>
>>> On Fri, Jun 3, 2016, 19:53 Davor Bonaci <da...@google.com.invalid>
>> wrote:
>>>
>>>> Welcome Python SDK, as well as Silviu, Charles, Ahmet and Chamikara!
>>>>
>>>> On Fri, Jun 3, 2016 at 7:07 AM, Jean-Baptiste Onofr� <jb...@nanthrax.net>
>>>> wrote:
>>>>
>>>>> Absolutely ;)
>>>>>
>>>>>
>>>>> On 06/03/2016 03:51 PM, James Malone wrote:
>>>>>
>>>>>> Hey Silviu!
>>>>>>
>>>>>> I think JB is proposing we create a python directory in the sdks
>>>> directory
>>>>>> in the root repository (and modify the configuration files
>>> accordingly):
>>>>>>
>>>>>>      https://github.com/apache/incubator-beam/tree/master/sdks
>>>>>>
>>>>>> This Beam document here titled "Apache Beam (Incubating): Repository
>>>>>> Structure" details the proposed repository structure and may be
>>> useful:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>> https://drive.google.com/a/google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> James
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 3, 2016 at 6:34 AM, Silviu Calinoiu
>>>>>> <si...@google.com.invalid>
>>>>>> wrote:
>>>>>>
>>>>>> Hi JB,
>>>>>>> Thanks for the welcome! I come from the Python land so  I am not
>>> quite
>>>>>>> familiar with Maven. What do you mean by a Maven module? You mean
>> an
>>>>>>> artifact so you can install things? In Python, people are used to
>>>>>>> packages
>>>>>>> downloaded from PyPI (pypi.python.org -- which is sort of Maven
>> for
>>>>>>> Python). Whatever is the standard way of doing things in Apache
>> we'll
>>>> do
>>>>>>> it. Just asking for clarifications.
>>>>>>>
>>>>>>> By the way this discussion is very useful since we will have to
>> iron
>>>> out
>>>>>>> several details like this.
>>>>>>> Thanks,
>>>>>>> Silviu
>>>>>>>
>>>>>>> On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofr� <
>>> jb@nanthrax.net>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Silviu,
>>>>>>>>
>>>>>>>> thanks for detailed update and great work !
>>>>>>>>
>>>>>>>> I would advice to create a:
>>>>>>>>
>>>>>>>> sdks/python
>>>>>>>>
>>>>>>>> Maven module to store the Python SDK.
>>>>>>>>
>>>>>>>> WDYT ?
>>>>>>>>
>>>>>>>> By the way, welcome aboard and great to have you all guys in the
>>> team
>>>> !
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> JB
>>>>>>>>
>>>>>>>> On 06/03/2016 03:13 PM, Silviu Calinoiu wrote:
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> My name is Silviu Calinoiu and I am a member of the Cloud
>> Dataflow
>>>> team
>>>>>>>>> working on the Python SDK.  As the original Beam proposal (
>>>>>>>>> https://wiki.apache.org/incubator/BeamProposal) mentioned, we
>> have
>>>>>>>>> been
>>>>>>>>> planning to merge the Python SDK into Beam. The Python SDK is in
>> an
>>>>>>>>>
>>>>>>>> early
>>>>>>>
>>>>>>>> stage of development (alpha milestone) and so this is a good time
>> to
>>>>>>>>>
>>>>>>>> move
>>>>>>>
>>>>>>>> the code without causing too much disruption to our customers.
>>>>>>>>> Additionally, this enables the Beam community to contribute as
>> soon
>>>> as
>>>>>>>>> possible.
>>>>>>>>>
>>>>>>>>> The current state of the SDK is as follows:
>>>>>>>>>
>>>>>>>>>       -
>>>>>>>>>
>>>>>>>>>       Open-sourced at
>>>>>>>>> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>       -
>>>>>>>>>
>>>>>>>>>       Model: All main concepts are present.
>>>>>>>>>       -
>>>>>>>>>
>>>>>>>>>       I/O: SDK supports text (Google Cloud Storage) and BigQuery
>>>>>>>>>
>>>>>>>> connectors
>>>>>>>
>>>>>>>>       and has a framework for adding additional sources and sinks.
>>>>>>>>>       -
>>>>>>>>>
>>>>>>>>>       Runners: SDK has two pipeline runners: direct runner (in
>>>> process,
>>>>>>>>> local
>>>>>>>>>       execution) and Cloud Dataflow runner for batch pipelines
>>> (submit
>>>>>>>>> job
>>>>>>>>> to
>>>>>>>>>       Google Dataflow service). The current direct runner is
>> bounded
>>>>>>>>> only
>>>>>>>>> (batch
>>>>>>>>>       execution) but there is work in progress to support
>> unbounded
>>>> (as
>>>>>>>>> in
>>>>>>>>> Java).
>>>>>>>>>       -
>>>>>>>>>
>>>>>>>>>       Testing: The code base has unit test coverage for all the
>>>> modules
>>>>>>>>>
>>>>>>>> and
>>>>>>>
>>>>>>>>       several integration and end to end tests (similar in coverage
>>> to
>>>>>>>>> the
>>>>>>>>> Java
>>>>>>>>>       SDK). Streaming is not well tested end to end yet since
>> Cloud
>>>>>>>>>
>>>>>>>> Dataflow
>>>>>>>
>>>>>>>>       focused first on batch.
>>>>>>>>>       -
>>>>>>>>>
>>>>>>>>>       Docs: We have matching Python documentation for the features
>>>>>>>>>
>>>>>>>> currently
>>>>>>>
>>>>>>>>       supported by Cloud Dataflow. The docs are on
>> cloud.google.com
>>>>>>>>>
>>>>>>>> (access
>>>>>>>
>>>>>>>>       only by whitelist due to the alpha stage of the project).
>> Devin
>>>> is
>>>>>>>>> working
>>>>>>>>>       on the transition of all docs to Apache.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In the next days/weeks we would like to prepare and start
>> migrating
>>>> the
>>>>>>>>> code and you should start seeing some pull requests. We also hope
>>>> that
>>>>>>>>>
>>>>>>>> the
>>>>>>>
>>>>>>>> Beam community will shape the SDK going forward. In particular,
>> all
>>>> the
>>>>>>>>> model improvements implemented for Java (Runner API, etc.) will
>>> have
>>>>>>>>> equivalents in Python once they stabilize. If you have any advice
>>>>>>>>> before
>>>>>>>>> we
>>>>>>>>> start the journey please let us know.
>>>>>>>>>
>>>>>>>>> The team that will join the Beam effort consists of me (Silviu
>>>>>>>>>
>>>>>>>> Calinoiu),
>>>>>>>
>>>>>>>> Charles Chen, Ahmet Altay, Chamikara Jayalath, and last but not
>>> least
>>>>>>>>> Robert Bradshaw (who is already an Apache Beam committer).
>>>>>>>>>
>>>>>>>>> So let us know what you think!
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>>
>>>>>>>>> Silviu
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>> Jean-Baptiste Onofr�
>>>>>>>> jbonofre@apache.org
>>>>>>>> http://blog.nanthrax.net
>>>>>>>> Talend - http://www.talend.com
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>> --
>>>>> Jean-Baptiste Onofr�
>>>>> jbonofre@apache.org
>>>>> http://blog.nanthrax.net
>>>>> Talend - http://www.talend.com
>>>>>
>>>>
>>>
>>
>

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Apache Beam for Python

Posted by Davor Bonaci <da...@google.com.INVALID>.
Awesome job, Silviu! Really excited to have Python SDK join us in Beam.

I'll take care of merging the pull request. Let's start with a feature
branch, as per previous conversations on the dev@ list.

On Tue, Jun 14, 2016 at 12:22 PM, Silviu Calinoiu <
silviuc@google.com.invalid> wrote:

> Thanks everybody for the welcoming and feedback. The initial code move was
> proposed as pull request #461 [1].
>
> Looking forward to working with everybody in the Beam community and
> especially any Pythonistas out there.
>
> Thanks,
> Silviu
>
> [1] https://github.com/apache/incubator-beam/pull/461
>
> On Sat, Jun 4, 2016 at 12:35 AM, Ismaël Mejía <ie...@gmail.com> wrote:
>
> > Excellent guys, Welcome to Beam !
> >
> > I am looking for ways to integrate Beam with the standard notebook tools
> > (Zẽppelin / Jupyter [ipython], so I am really happy to see the python SDK
> > arriving to Beam, Awesome.
> >
> > Ismaël Mejía
> >
> > On Fri, Jun 3, 2016 at 7:17 PM, Amit Sela <am...@gmail.com> wrote:
> >
> > > Welcome Python people ;)
> > >
> > > I know a few people who've been waiting for this one!
> > >
> > > On Fri, Jun 3, 2016, 19:53 Davor Bonaci <da...@google.com.invalid>
> > wrote:
> > >
> > > > Welcome Python SDK, as well as Silviu, Charles, Ahmet and Chamikara!
> > > >
> > > > On Fri, Jun 3, 2016 at 7:07 AM, Jean-Baptiste Onofré <
> jb@nanthrax.net>
> > > > wrote:
> > > >
> > > > > Absolutely ;)
> > > > >
> > > > >
> > > > > On 06/03/2016 03:51 PM, James Malone wrote:
> > > > >
> > > > >> Hey Silviu!
> > > > >>
> > > > >> I think JB is proposing we create a python directory in the sdks
> > > > directory
> > > > >> in the root repository (and modify the configuration files
> > > accordingly):
> > > > >>
> > > > >>     https://github.com/apache/incubator-beam/tree/master/sdks
> > > > >>
> > > > >> This Beam document here titled "Apache Beam (Incubating):
> Repository
> > > > >> Structure" details the proposed repository structure and may be
> > > useful:
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://drive.google.com/a/google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc
> > > > >>
> > > > >> Best,
> > > > >>
> > > > >> James
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Fri, Jun 3, 2016 at 6:34 AM, Silviu Calinoiu
> > > > >> <si...@google.com.invalid>
> > > > >> wrote:
> > > > >>
> > > > >> Hi JB,
> > > > >>> Thanks for the welcome! I come from the Python land so  I am not
> > > quite
> > > > >>> familiar with Maven. What do you mean by a Maven module? You mean
> > an
> > > > >>> artifact so you can install things? In Python, people are used to
> > > > >>> packages
> > > > >>> downloaded from PyPI (pypi.python.org -- which is sort of Maven
> > for
> > > > >>> Python). Whatever is the standard way of doing things in Apache
> > we'll
> > > > do
> > > > >>> it. Just asking for clarifications.
> > > > >>>
> > > > >>> By the way this discussion is very useful since we will have to
> > iron
> > > > out
> > > > >>> several details like this.
> > > > >>> Thanks,
> > > > >>> Silviu
> > > > >>>
> > > > >>> On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofré <
> > > jb@nanthrax.net>
> > > > >>> wrote:
> > > > >>>
> > > > >>> Hi Silviu,
> > > > >>>>
> > > > >>>> thanks for detailed update and great work !
> > > > >>>>
> > > > >>>> I would advice to create a:
> > > > >>>>
> > > > >>>> sdks/python
> > > > >>>>
> > > > >>>> Maven module to store the Python SDK.
> > > > >>>>
> > > > >>>> WDYT ?
> > > > >>>>
> > > > >>>> By the way, welcome aboard and great to have you all guys in the
> > > team
> > > > !
> > > > >>>>
> > > > >>>> Regards
> > > > >>>> JB
> > > > >>>>
> > > > >>>> On 06/03/2016 03:13 PM, Silviu Calinoiu wrote:
> > > > >>>>
> > > > >>>> Hi all,
> > > > >>>>>
> > > > >>>>> My name is Silviu Calinoiu and I am a member of the Cloud
> > Dataflow
> > > > team
> > > > >>>>> working on the Python SDK.  As the original Beam proposal (
> > > > >>>>> https://wiki.apache.org/incubator/BeamProposal) mentioned, we
> > have
> > > > >>>>> been
> > > > >>>>> planning to merge the Python SDK into Beam. The Python SDK is
> in
> > an
> > > > >>>>>
> > > > >>>> early
> > > > >>>
> > > > >>>> stage of development (alpha milestone) and so this is a good
> time
> > to
> > > > >>>>>
> > > > >>>> move
> > > > >>>
> > > > >>>> the code without causing too much disruption to our customers.
> > > > >>>>> Additionally, this enables the Beam community to contribute as
> > soon
> > > > as
> > > > >>>>> possible.
> > > > >>>>>
> > > > >>>>> The current state of the SDK is as follows:
> > > > >>>>>
> > > > >>>>>      -
> > > > >>>>>
> > > > >>>>>      Open-sourced at
> > > > >>>>> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>      -
> > > > >>>>>
> > > > >>>>>      Model: All main concepts are present.
> > > > >>>>>      -
> > > > >>>>>
> > > > >>>>>      I/O: SDK supports text (Google Cloud Storage) and BigQuery
> > > > >>>>>
> > > > >>>> connectors
> > > > >>>
> > > > >>>>      and has a framework for adding additional sources and
> sinks.
> > > > >>>>>      -
> > > > >>>>>
> > > > >>>>>      Runners: SDK has two pipeline runners: direct runner (in
> > > > process,
> > > > >>>>> local
> > > > >>>>>      execution) and Cloud Dataflow runner for batch pipelines
> > > (submit
> > > > >>>>> job
> > > > >>>>> to
> > > > >>>>>      Google Dataflow service). The current direct runner is
> > bounded
> > > > >>>>> only
> > > > >>>>> (batch
> > > > >>>>>      execution) but there is work in progress to support
> > unbounded
> > > > (as
> > > > >>>>> in
> > > > >>>>> Java).
> > > > >>>>>      -
> > > > >>>>>
> > > > >>>>>      Testing: The code base has unit test coverage for all the
> > > > modules
> > > > >>>>>
> > > > >>>> and
> > > > >>>
> > > > >>>>      several integration and end to end tests (similar in
> coverage
> > > to
> > > > >>>>> the
> > > > >>>>> Java
> > > > >>>>>      SDK). Streaming is not well tested end to end yet since
> > Cloud
> > > > >>>>>
> > > > >>>> Dataflow
> > > > >>>
> > > > >>>>      focused first on batch.
> > > > >>>>>      -
> > > > >>>>>
> > > > >>>>>      Docs: We have matching Python documentation for the
> features
> > > > >>>>>
> > > > >>>> currently
> > > > >>>
> > > > >>>>      supported by Cloud Dataflow. The docs are on
> > cloud.google.com
> > > > >>>>>
> > > > >>>> (access
> > > > >>>
> > > > >>>>      only by whitelist due to the alpha stage of the project).
> > Devin
> > > > is
> > > > >>>>> working
> > > > >>>>>      on the transition of all docs to Apache.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> In the next days/weeks we would like to prepare and start
> > migrating
> > > > the
> > > > >>>>> code and you should start seeing some pull requests. We also
> hope
> > > > that
> > > > >>>>>
> > > > >>>> the
> > > > >>>
> > > > >>>> Beam community will shape the SDK going forward. In particular,
> > all
> > > > the
> > > > >>>>> model improvements implemented for Java (Runner API, etc.) will
> > > have
> > > > >>>>> equivalents in Python once they stabilize. If you have any
> advice
> > > > >>>>> before
> > > > >>>>> we
> > > > >>>>> start the journey please let us know.
> > > > >>>>>
> > > > >>>>> The team that will join the Beam effort consists of me (Silviu
> > > > >>>>>
> > > > >>>> Calinoiu),
> > > > >>>
> > > > >>>> Charles Chen, Ahmet Altay, Chamikara Jayalath, and last but not
> > > least
> > > > >>>>> Robert Bradshaw (who is already an Apache Beam committer).
> > > > >>>>>
> > > > >>>>> So let us know what you think!
> > > > >>>>>
> > > > >>>>> Best regards,
> > > > >>>>>
> > > > >>>>> Silviu
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> --
> > > > >>>> Jean-Baptiste Onofré
> > > > >>>> jbonofre@apache.org
> > > > >>>> http://blog.nanthrax.net
> > > > >>>> Talend - http://www.talend.com
> > > > >>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > > > --
> > > > > Jean-Baptiste Onofré
> > > > > jbonofre@apache.org
> > > > > http://blog.nanthrax.net
> > > > > Talend - http://www.talend.com
> > > > >
> > > >
> > >
> >
>

Re: Apache Beam for Python

Posted by Silviu Calinoiu <si...@google.com.INVALID>.
Thanks everybody for the welcoming and feedback. The initial code move was
proposed as pull request #461 [1].

Looking forward to working with everybody in the Beam community and
especially any Pythonistas out there.

Thanks,
Silviu

[1] https://github.com/apache/incubator-beam/pull/461

On Sat, Jun 4, 2016 at 12:35 AM, Ismaël Mejía <ie...@gmail.com> wrote:

> Excellent guys, Welcome to Beam !
>
> I am looking for ways to integrate Beam with the standard notebook tools
> (Zẽppelin / Jupyter [ipython], so I am really happy to see the python SDK
> arriving to Beam, Awesome.
>
> Ismaël Mejía
>
> On Fri, Jun 3, 2016 at 7:17 PM, Amit Sela <am...@gmail.com> wrote:
>
> > Welcome Python people ;)
> >
> > I know a few people who've been waiting for this one!
> >
> > On Fri, Jun 3, 2016, 19:53 Davor Bonaci <da...@google.com.invalid>
> wrote:
> >
> > > Welcome Python SDK, as well as Silviu, Charles, Ahmet and Chamikara!
> > >
> > > On Fri, Jun 3, 2016 at 7:07 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> > > wrote:
> > >
> > > > Absolutely ;)
> > > >
> > > >
> > > > On 06/03/2016 03:51 PM, James Malone wrote:
> > > >
> > > >> Hey Silviu!
> > > >>
> > > >> I think JB is proposing we create a python directory in the sdks
> > > directory
> > > >> in the root repository (and modify the configuration files
> > accordingly):
> > > >>
> > > >>     https://github.com/apache/incubator-beam/tree/master/sdks
> > > >>
> > > >> This Beam document here titled "Apache Beam (Incubating): Repository
> > > >> Structure" details the proposed repository structure and may be
> > useful:
> > > >>
> > > >>
> > > >>
> > > >>
> > >
> >
> https://drive.google.com/a/google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc
> > > >>
> > > >> Best,
> > > >>
> > > >> James
> > > >>
> > > >>
> > > >>
> > > >> On Fri, Jun 3, 2016 at 6:34 AM, Silviu Calinoiu
> > > >> <si...@google.com.invalid>
> > > >> wrote:
> > > >>
> > > >> Hi JB,
> > > >>> Thanks for the welcome! I come from the Python land so  I am not
> > quite
> > > >>> familiar with Maven. What do you mean by a Maven module? You mean
> an
> > > >>> artifact so you can install things? In Python, people are used to
> > > >>> packages
> > > >>> downloaded from PyPI (pypi.python.org -- which is sort of Maven
> for
> > > >>> Python). Whatever is the standard way of doing things in Apache
> we'll
> > > do
> > > >>> it. Just asking for clarifications.
> > > >>>
> > > >>> By the way this discussion is very useful since we will have to
> iron
> > > out
> > > >>> several details like this.
> > > >>> Thanks,
> > > >>> Silviu
> > > >>>
> > > >>> On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofré <
> > jb@nanthrax.net>
> > > >>> wrote:
> > > >>>
> > > >>> Hi Silviu,
> > > >>>>
> > > >>>> thanks for detailed update and great work !
> > > >>>>
> > > >>>> I would advice to create a:
> > > >>>>
> > > >>>> sdks/python
> > > >>>>
> > > >>>> Maven module to store the Python SDK.
> > > >>>>
> > > >>>> WDYT ?
> > > >>>>
> > > >>>> By the way, welcome aboard and great to have you all guys in the
> > team
> > > !
> > > >>>>
> > > >>>> Regards
> > > >>>> JB
> > > >>>>
> > > >>>> On 06/03/2016 03:13 PM, Silviu Calinoiu wrote:
> > > >>>>
> > > >>>> Hi all,
> > > >>>>>
> > > >>>>> My name is Silviu Calinoiu and I am a member of the Cloud
> Dataflow
> > > team
> > > >>>>> working on the Python SDK.  As the original Beam proposal (
> > > >>>>> https://wiki.apache.org/incubator/BeamProposal) mentioned, we
> have
> > > >>>>> been
> > > >>>>> planning to merge the Python SDK into Beam. The Python SDK is in
> an
> > > >>>>>
> > > >>>> early
> > > >>>
> > > >>>> stage of development (alpha milestone) and so this is a good time
> to
> > > >>>>>
> > > >>>> move
> > > >>>
> > > >>>> the code without causing too much disruption to our customers.
> > > >>>>> Additionally, this enables the Beam community to contribute as
> soon
> > > as
> > > >>>>> possible.
> > > >>>>>
> > > >>>>> The current state of the SDK is as follows:
> > > >>>>>
> > > >>>>>      -
> > > >>>>>
> > > >>>>>      Open-sourced at
> > > >>>>> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/
> > > >>>>>
> > > >>>>>
> > > >>>>>      -
> > > >>>>>
> > > >>>>>      Model: All main concepts are present.
> > > >>>>>      -
> > > >>>>>
> > > >>>>>      I/O: SDK supports text (Google Cloud Storage) and BigQuery
> > > >>>>>
> > > >>>> connectors
> > > >>>
> > > >>>>      and has a framework for adding additional sources and sinks.
> > > >>>>>      -
> > > >>>>>
> > > >>>>>      Runners: SDK has two pipeline runners: direct runner (in
> > > process,
> > > >>>>> local
> > > >>>>>      execution) and Cloud Dataflow runner for batch pipelines
> > (submit
> > > >>>>> job
> > > >>>>> to
> > > >>>>>      Google Dataflow service). The current direct runner is
> bounded
> > > >>>>> only
> > > >>>>> (batch
> > > >>>>>      execution) but there is work in progress to support
> unbounded
> > > (as
> > > >>>>> in
> > > >>>>> Java).
> > > >>>>>      -
> > > >>>>>
> > > >>>>>      Testing: The code base has unit test coverage for all the
> > > modules
> > > >>>>>
> > > >>>> and
> > > >>>
> > > >>>>      several integration and end to end tests (similar in coverage
> > to
> > > >>>>> the
> > > >>>>> Java
> > > >>>>>      SDK). Streaming is not well tested end to end yet since
> Cloud
> > > >>>>>
> > > >>>> Dataflow
> > > >>>
> > > >>>>      focused first on batch.
> > > >>>>>      -
> > > >>>>>
> > > >>>>>      Docs: We have matching Python documentation for the features
> > > >>>>>
> > > >>>> currently
> > > >>>
> > > >>>>      supported by Cloud Dataflow. The docs are on
> cloud.google.com
> > > >>>>>
> > > >>>> (access
> > > >>>
> > > >>>>      only by whitelist due to the alpha stage of the project).
> Devin
> > > is
> > > >>>>> working
> > > >>>>>      on the transition of all docs to Apache.
> > > >>>>>
> > > >>>>>
> > > >>>>> In the next days/weeks we would like to prepare and start
> migrating
> > > the
> > > >>>>> code and you should start seeing some pull requests. We also hope
> > > that
> > > >>>>>
> > > >>>> the
> > > >>>
> > > >>>> Beam community will shape the SDK going forward. In particular,
> all
> > > the
> > > >>>>> model improvements implemented for Java (Runner API, etc.) will
> > have
> > > >>>>> equivalents in Python once they stabilize. If you have any advice
> > > >>>>> before
> > > >>>>> we
> > > >>>>> start the journey please let us know.
> > > >>>>>
> > > >>>>> The team that will join the Beam effort consists of me (Silviu
> > > >>>>>
> > > >>>> Calinoiu),
> > > >>>
> > > >>>> Charles Chen, Ahmet Altay, Chamikara Jayalath, and last but not
> > least
> > > >>>>> Robert Bradshaw (who is already an Apache Beam committer).
> > > >>>>>
> > > >>>>> So let us know what you think!
> > > >>>>>
> > > >>>>> Best regards,
> > > >>>>>
> > > >>>>> Silviu
> > > >>>>>
> > > >>>>>
> > > >>>>> --
> > > >>>> Jean-Baptiste Onofré
> > > >>>> jbonofre@apache.org
> > > >>>> http://blog.nanthrax.net
> > > >>>> Talend - http://www.talend.com
> > > >>>>
> > > >>>>
> > > >>>
> > > >>
> > > > --
> > > > Jean-Baptiste Onofré
> > > > jbonofre@apache.org
> > > > http://blog.nanthrax.net
> > > > Talend - http://www.talend.com
> > > >
> > >
> >
>

Re: Apache Beam for Python

Posted by Ismaël Mejía <ie...@gmail.com>.
Excellent guys, Welcome to Beam !

I am looking for ways to integrate Beam with the standard notebook tools
(Zẽppelin / Jupyter [ipython], so I am really happy to see the python SDK
arriving to Beam, Awesome.

Ismaël Mejía

On Fri, Jun 3, 2016 at 7:17 PM, Amit Sela <am...@gmail.com> wrote:

> Welcome Python people ;)
>
> I know a few people who've been waiting for this one!
>
> On Fri, Jun 3, 2016, 19:53 Davor Bonaci <da...@google.com.invalid> wrote:
>
> > Welcome Python SDK, as well as Silviu, Charles, Ahmet and Chamikara!
> >
> > On Fri, Jun 3, 2016 at 7:07 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> > wrote:
> >
> > > Absolutely ;)
> > >
> > >
> > > On 06/03/2016 03:51 PM, James Malone wrote:
> > >
> > >> Hey Silviu!
> > >>
> > >> I think JB is proposing we create a python directory in the sdks
> > directory
> > >> in the root repository (and modify the configuration files
> accordingly):
> > >>
> > >>     https://github.com/apache/incubator-beam/tree/master/sdks
> > >>
> > >> This Beam document here titled "Apache Beam (Incubating): Repository
> > >> Structure" details the proposed repository structure and may be
> useful:
> > >>
> > >>
> > >>
> > >>
> >
> https://drive.google.com/a/google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc
> > >>
> > >> Best,
> > >>
> > >> James
> > >>
> > >>
> > >>
> > >> On Fri, Jun 3, 2016 at 6:34 AM, Silviu Calinoiu
> > >> <si...@google.com.invalid>
> > >> wrote:
> > >>
> > >> Hi JB,
> > >>> Thanks for the welcome! I come from the Python land so  I am not
> quite
> > >>> familiar with Maven. What do you mean by a Maven module? You mean an
> > >>> artifact so you can install things? In Python, people are used to
> > >>> packages
> > >>> downloaded from PyPI (pypi.python.org -- which is sort of Maven for
> > >>> Python). Whatever is the standard way of doing things in Apache we'll
> > do
> > >>> it. Just asking for clarifications.
> > >>>
> > >>> By the way this discussion is very useful since we will have to iron
> > out
> > >>> several details like this.
> > >>> Thanks,
> > >>> Silviu
> > >>>
> > >>> On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofré <
> jb@nanthrax.net>
> > >>> wrote:
> > >>>
> > >>> Hi Silviu,
> > >>>>
> > >>>> thanks for detailed update and great work !
> > >>>>
> > >>>> I would advice to create a:
> > >>>>
> > >>>> sdks/python
> > >>>>
> > >>>> Maven module to store the Python SDK.
> > >>>>
> > >>>> WDYT ?
> > >>>>
> > >>>> By the way, welcome aboard and great to have you all guys in the
> team
> > !
> > >>>>
> > >>>> Regards
> > >>>> JB
> > >>>>
> > >>>> On 06/03/2016 03:13 PM, Silviu Calinoiu wrote:
> > >>>>
> > >>>> Hi all,
> > >>>>>
> > >>>>> My name is Silviu Calinoiu and I am a member of the Cloud Dataflow
> > team
> > >>>>> working on the Python SDK.  As the original Beam proposal (
> > >>>>> https://wiki.apache.org/incubator/BeamProposal) mentioned, we have
> > >>>>> been
> > >>>>> planning to merge the Python SDK into Beam. The Python SDK is in an
> > >>>>>
> > >>>> early
> > >>>
> > >>>> stage of development (alpha milestone) and so this is a good time to
> > >>>>>
> > >>>> move
> > >>>
> > >>>> the code without causing too much disruption to our customers.
> > >>>>> Additionally, this enables the Beam community to contribute as soon
> > as
> > >>>>> possible.
> > >>>>>
> > >>>>> The current state of the SDK is as follows:
> > >>>>>
> > >>>>>      -
> > >>>>>
> > >>>>>      Open-sourced at
> > >>>>> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/
> > >>>>>
> > >>>>>
> > >>>>>      -
> > >>>>>
> > >>>>>      Model: All main concepts are present.
> > >>>>>      -
> > >>>>>
> > >>>>>      I/O: SDK supports text (Google Cloud Storage) and BigQuery
> > >>>>>
> > >>>> connectors
> > >>>
> > >>>>      and has a framework for adding additional sources and sinks.
> > >>>>>      -
> > >>>>>
> > >>>>>      Runners: SDK has two pipeline runners: direct runner (in
> > process,
> > >>>>> local
> > >>>>>      execution) and Cloud Dataflow runner for batch pipelines
> (submit
> > >>>>> job
> > >>>>> to
> > >>>>>      Google Dataflow service). The current direct runner is bounded
> > >>>>> only
> > >>>>> (batch
> > >>>>>      execution) but there is work in progress to support unbounded
> > (as
> > >>>>> in
> > >>>>> Java).
> > >>>>>      -
> > >>>>>
> > >>>>>      Testing: The code base has unit test coverage for all the
> > modules
> > >>>>>
> > >>>> and
> > >>>
> > >>>>      several integration and end to end tests (similar in coverage
> to
> > >>>>> the
> > >>>>> Java
> > >>>>>      SDK). Streaming is not well tested end to end yet since Cloud
> > >>>>>
> > >>>> Dataflow
> > >>>
> > >>>>      focused first on batch.
> > >>>>>      -
> > >>>>>
> > >>>>>      Docs: We have matching Python documentation for the features
> > >>>>>
> > >>>> currently
> > >>>
> > >>>>      supported by Cloud Dataflow. The docs are on cloud.google.com
> > >>>>>
> > >>>> (access
> > >>>
> > >>>>      only by whitelist due to the alpha stage of the project). Devin
> > is
> > >>>>> working
> > >>>>>      on the transition of all docs to Apache.
> > >>>>>
> > >>>>>
> > >>>>> In the next days/weeks we would like to prepare and start migrating
> > the
> > >>>>> code and you should start seeing some pull requests. We also hope
> > that
> > >>>>>
> > >>>> the
> > >>>
> > >>>> Beam community will shape the SDK going forward. In particular, all
> > the
> > >>>>> model improvements implemented for Java (Runner API, etc.) will
> have
> > >>>>> equivalents in Python once they stabilize. If you have any advice
> > >>>>> before
> > >>>>> we
> > >>>>> start the journey please let us know.
> > >>>>>
> > >>>>> The team that will join the Beam effort consists of me (Silviu
> > >>>>>
> > >>>> Calinoiu),
> > >>>
> > >>>> Charles Chen, Ahmet Altay, Chamikara Jayalath, and last but not
> least
> > >>>>> Robert Bradshaw (who is already an Apache Beam committer).
> > >>>>>
> > >>>>> So let us know what you think!
> > >>>>>
> > >>>>> Best regards,
> > >>>>>
> > >>>>> Silviu
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>> Jean-Baptiste Onofré
> > >>>> jbonofre@apache.org
> > >>>> http://blog.nanthrax.net
> > >>>> Talend - http://www.talend.com
> > >>>>
> > >>>>
> > >>>
> > >>
> > > --
> > > Jean-Baptiste Onofré
> > > jbonofre@apache.org
> > > http://blog.nanthrax.net
> > > Talend - http://www.talend.com
> > >
> >
>

Re: Apache Beam for Python

Posted by Amit Sela <am...@gmail.com>.
Welcome Python people ;)

I know a few people who've been waiting for this one!

On Fri, Jun 3, 2016, 19:53 Davor Bonaci <da...@google.com.invalid> wrote:

> Welcome Python SDK, as well as Silviu, Charles, Ahmet and Chamikara!
>
> On Fri, Jun 3, 2016 at 7:07 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
>
> > Absolutely ;)
> >
> >
> > On 06/03/2016 03:51 PM, James Malone wrote:
> >
> >> Hey Silviu!
> >>
> >> I think JB is proposing we create a python directory in the sdks
> directory
> >> in the root repository (and modify the configuration files accordingly):
> >>
> >>     https://github.com/apache/incubator-beam/tree/master/sdks
> >>
> >> This Beam document here titled "Apache Beam (Incubating): Repository
> >> Structure" details the proposed repository structure and may be useful:
> >>
> >>
> >>
> >>
> https://drive.google.com/a/google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc
> >>
> >> Best,
> >>
> >> James
> >>
> >>
> >>
> >> On Fri, Jun 3, 2016 at 6:34 AM, Silviu Calinoiu
> >> <si...@google.com.invalid>
> >> wrote:
> >>
> >> Hi JB,
> >>> Thanks for the welcome! I come from the Python land so  I am not quite
> >>> familiar with Maven. What do you mean by a Maven module? You mean an
> >>> artifact so you can install things? In Python, people are used to
> >>> packages
> >>> downloaded from PyPI (pypi.python.org -- which is sort of Maven for
> >>> Python). Whatever is the standard way of doing things in Apache we'll
> do
> >>> it. Just asking for clarifications.
> >>>
> >>> By the way this discussion is very useful since we will have to iron
> out
> >>> several details like this.
> >>> Thanks,
> >>> Silviu
> >>>
> >>> On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> >>> wrote:
> >>>
> >>> Hi Silviu,
> >>>>
> >>>> thanks for detailed update and great work !
> >>>>
> >>>> I would advice to create a:
> >>>>
> >>>> sdks/python
> >>>>
> >>>> Maven module to store the Python SDK.
> >>>>
> >>>> WDYT ?
> >>>>
> >>>> By the way, welcome aboard and great to have you all guys in the team
> !
> >>>>
> >>>> Regards
> >>>> JB
> >>>>
> >>>> On 06/03/2016 03:13 PM, Silviu Calinoiu wrote:
> >>>>
> >>>> Hi all,
> >>>>>
> >>>>> My name is Silviu Calinoiu and I am a member of the Cloud Dataflow
> team
> >>>>> working on the Python SDK.  As the original Beam proposal (
> >>>>> https://wiki.apache.org/incubator/BeamProposal) mentioned, we have
> >>>>> been
> >>>>> planning to merge the Python SDK into Beam. The Python SDK is in an
> >>>>>
> >>>> early
> >>>
> >>>> stage of development (alpha milestone) and so this is a good time to
> >>>>>
> >>>> move
> >>>
> >>>> the code without causing too much disruption to our customers.
> >>>>> Additionally, this enables the Beam community to contribute as soon
> as
> >>>>> possible.
> >>>>>
> >>>>> The current state of the SDK is as follows:
> >>>>>
> >>>>>      -
> >>>>>
> >>>>>      Open-sourced at
> >>>>> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/
> >>>>>
> >>>>>
> >>>>>      -
> >>>>>
> >>>>>      Model: All main concepts are present.
> >>>>>      -
> >>>>>
> >>>>>      I/O: SDK supports text (Google Cloud Storage) and BigQuery
> >>>>>
> >>>> connectors
> >>>
> >>>>      and has a framework for adding additional sources and sinks.
> >>>>>      -
> >>>>>
> >>>>>      Runners: SDK has two pipeline runners: direct runner (in
> process,
> >>>>> local
> >>>>>      execution) and Cloud Dataflow runner for batch pipelines (submit
> >>>>> job
> >>>>> to
> >>>>>      Google Dataflow service). The current direct runner is bounded
> >>>>> only
> >>>>> (batch
> >>>>>      execution) but there is work in progress to support unbounded
> (as
> >>>>> in
> >>>>> Java).
> >>>>>      -
> >>>>>
> >>>>>      Testing: The code base has unit test coverage for all the
> modules
> >>>>>
> >>>> and
> >>>
> >>>>      several integration and end to end tests (similar in coverage to
> >>>>> the
> >>>>> Java
> >>>>>      SDK). Streaming is not well tested end to end yet since Cloud
> >>>>>
> >>>> Dataflow
> >>>
> >>>>      focused first on batch.
> >>>>>      -
> >>>>>
> >>>>>      Docs: We have matching Python documentation for the features
> >>>>>
> >>>> currently
> >>>
> >>>>      supported by Cloud Dataflow. The docs are on cloud.google.com
> >>>>>
> >>>> (access
> >>>
> >>>>      only by whitelist due to the alpha stage of the project). Devin
> is
> >>>>> working
> >>>>>      on the transition of all docs to Apache.
> >>>>>
> >>>>>
> >>>>> In the next days/weeks we would like to prepare and start migrating
> the
> >>>>> code and you should start seeing some pull requests. We also hope
> that
> >>>>>
> >>>> the
> >>>
> >>>> Beam community will shape the SDK going forward. In particular, all
> the
> >>>>> model improvements implemented for Java (Runner API, etc.) will have
> >>>>> equivalents in Python once they stabilize. If you have any advice
> >>>>> before
> >>>>> we
> >>>>> start the journey please let us know.
> >>>>>
> >>>>> The team that will join the Beam effort consists of me (Silviu
> >>>>>
> >>>> Calinoiu),
> >>>
> >>>> Charles Chen, Ahmet Altay, Chamikara Jayalath, and last but not least
> >>>>> Robert Bradshaw (who is already an Apache Beam committer).
> >>>>>
> >>>>> So let us know what you think!
> >>>>>
> >>>>> Best regards,
> >>>>>
> >>>>> Silviu
> >>>>>
> >>>>>
> >>>>> --
> >>>> Jean-Baptiste Onofré
> >>>> jbonofre@apache.org
> >>>> http://blog.nanthrax.net
> >>>> Talend - http://www.talend.com
> >>>>
> >>>>
> >>>
> >>
> > --
> > Jean-Baptiste Onofré
> > jbonofre@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>

Re: Apache Beam for Python

Posted by Davor Bonaci <da...@google.com.INVALID>.
Welcome Python SDK, as well as Silviu, Charles, Ahmet and Chamikara!

On Fri, Jun 3, 2016 at 7:07 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Absolutely ;)
>
>
> On 06/03/2016 03:51 PM, James Malone wrote:
>
>> Hey Silviu!
>>
>> I think JB is proposing we create a python directory in the sdks directory
>> in the root repository (and modify the configuration files accordingly):
>>
>>     https://github.com/apache/incubator-beam/tree/master/sdks
>>
>> This Beam document here titled "Apache Beam (Incubating): Repository
>> Structure" details the proposed repository structure and may be useful:
>>
>>
>>
>> https://drive.google.com/a/google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc
>>
>> Best,
>>
>> James
>>
>>
>>
>> On Fri, Jun 3, 2016 at 6:34 AM, Silviu Calinoiu
>> <si...@google.com.invalid>
>> wrote:
>>
>> Hi JB,
>>> Thanks for the welcome! I come from the Python land so  I am not quite
>>> familiar with Maven. What do you mean by a Maven module? You mean an
>>> artifact so you can install things? In Python, people are used to
>>> packages
>>> downloaded from PyPI (pypi.python.org -- which is sort of Maven for
>>> Python). Whatever is the standard way of doing things in Apache we'll do
>>> it. Just asking for clarifications.
>>>
>>> By the way this discussion is very useful since we will have to iron out
>>> several details like this.
>>> Thanks,
>>> Silviu
>>>
>>> On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
>>> wrote:
>>>
>>> Hi Silviu,
>>>>
>>>> thanks for detailed update and great work !
>>>>
>>>> I would advice to create a:
>>>>
>>>> sdks/python
>>>>
>>>> Maven module to store the Python SDK.
>>>>
>>>> WDYT ?
>>>>
>>>> By the way, welcome aboard and great to have you all guys in the team !
>>>>
>>>> Regards
>>>> JB
>>>>
>>>> On 06/03/2016 03:13 PM, Silviu Calinoiu wrote:
>>>>
>>>> Hi all,
>>>>>
>>>>> My name is Silviu Calinoiu and I am a member of the Cloud Dataflow team
>>>>> working on the Python SDK.  As the original Beam proposal (
>>>>> https://wiki.apache.org/incubator/BeamProposal) mentioned, we have
>>>>> been
>>>>> planning to merge the Python SDK into Beam. The Python SDK is in an
>>>>>
>>>> early
>>>
>>>> stage of development (alpha milestone) and so this is a good time to
>>>>>
>>>> move
>>>
>>>> the code without causing too much disruption to our customers.
>>>>> Additionally, this enables the Beam community to contribute as soon as
>>>>> possible.
>>>>>
>>>>> The current state of the SDK is as follows:
>>>>>
>>>>>      -
>>>>>
>>>>>      Open-sourced at
>>>>> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/
>>>>>
>>>>>
>>>>>      -
>>>>>
>>>>>      Model: All main concepts are present.
>>>>>      -
>>>>>
>>>>>      I/O: SDK supports text (Google Cloud Storage) and BigQuery
>>>>>
>>>> connectors
>>>
>>>>      and has a framework for adding additional sources and sinks.
>>>>>      -
>>>>>
>>>>>      Runners: SDK has two pipeline runners: direct runner (in process,
>>>>> local
>>>>>      execution) and Cloud Dataflow runner for batch pipelines (submit
>>>>> job
>>>>> to
>>>>>      Google Dataflow service). The current direct runner is bounded
>>>>> only
>>>>> (batch
>>>>>      execution) but there is work in progress to support unbounded (as
>>>>> in
>>>>> Java).
>>>>>      -
>>>>>
>>>>>      Testing: The code base has unit test coverage for all the modules
>>>>>
>>>> and
>>>
>>>>      several integration and end to end tests (similar in coverage to
>>>>> the
>>>>> Java
>>>>>      SDK). Streaming is not well tested end to end yet since Cloud
>>>>>
>>>> Dataflow
>>>
>>>>      focused first on batch.
>>>>>      -
>>>>>
>>>>>      Docs: We have matching Python documentation for the features
>>>>>
>>>> currently
>>>
>>>>      supported by Cloud Dataflow. The docs are on cloud.google.com
>>>>>
>>>> (access
>>>
>>>>      only by whitelist due to the alpha stage of the project). Devin is
>>>>> working
>>>>>      on the transition of all docs to Apache.
>>>>>
>>>>>
>>>>> In the next days/weeks we would like to prepare and start migrating the
>>>>> code and you should start seeing some pull requests. We also hope that
>>>>>
>>>> the
>>>
>>>> Beam community will shape the SDK going forward. In particular, all the
>>>>> model improvements implemented for Java (Runner API, etc.) will have
>>>>> equivalents in Python once they stabilize. If you have any advice
>>>>> before
>>>>> we
>>>>> start the journey please let us know.
>>>>>
>>>>> The team that will join the Beam effort consists of me (Silviu
>>>>>
>>>> Calinoiu),
>>>
>>>> Charles Chen, Ahmet Altay, Chamikara Jayalath, and last but not least
>>>>> Robert Bradshaw (who is already an Apache Beam committer).
>>>>>
>>>>> So let us know what you think!
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Silviu
>>>>>
>>>>>
>>>>> --
>>>> Jean-Baptiste Onofré
>>>> jbonofre@apache.org
>>>> http://blog.nanthrax.net
>>>> Talend - http://www.talend.com
>>>>
>>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Apache Beam for Python

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Absolutely ;)

On 06/03/2016 03:51 PM, James Malone wrote:
> Hey Silviu!
>
> I think JB is proposing we create a python directory in the sdks directory
> in the root repository (and modify the configuration files accordingly):
>
>     https://github.com/apache/incubator-beam/tree/master/sdks
>
> This Beam document here titled "Apache Beam (Incubating): Repository
> Structure" details the proposed repository structure and may be useful:
>
>
> https://drive.google.com/a/google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc
>
> Best,
>
> James
>
>
>
> On Fri, Jun 3, 2016 at 6:34 AM, Silviu Calinoiu <si...@google.com.invalid>
> wrote:
>
>> Hi JB,
>> Thanks for the welcome! I come from the Python land so  I am not quite
>> familiar with Maven. What do you mean by a Maven module? You mean an
>> artifact so you can install things? In Python, people are used to packages
>> downloaded from PyPI (pypi.python.org -- which is sort of Maven for
>> Python). Whatever is the standard way of doing things in Apache we'll do
>> it. Just asking for clarifications.
>>
>> By the way this discussion is very useful since we will have to iron out
>> several details like this.
>> Thanks,
>> Silviu
>>
>> On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofr� <jb...@nanthrax.net>
>> wrote:
>>
>>> Hi Silviu,
>>>
>>> thanks for detailed update and great work !
>>>
>>> I would advice to create a:
>>>
>>> sdks/python
>>>
>>> Maven module to store the Python SDK.
>>>
>>> WDYT ?
>>>
>>> By the way, welcome aboard and great to have you all guys in the team !
>>>
>>> Regards
>>> JB
>>>
>>> On 06/03/2016 03:13 PM, Silviu Calinoiu wrote:
>>>
>>>> Hi all,
>>>>
>>>> My name is Silviu Calinoiu and I am a member of the Cloud Dataflow team
>>>> working on the Python SDK.  As the original Beam proposal (
>>>> https://wiki.apache.org/incubator/BeamProposal) mentioned, we have been
>>>> planning to merge the Python SDK into Beam. The Python SDK is in an
>> early
>>>> stage of development (alpha milestone) and so this is a good time to
>> move
>>>> the code without causing too much disruption to our customers.
>>>> Additionally, this enables the Beam community to contribute as soon as
>>>> possible.
>>>>
>>>> The current state of the SDK is as follows:
>>>>
>>>>      -
>>>>
>>>>      Open-sourced at
>>>> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/
>>>>
>>>>
>>>>      -
>>>>
>>>>      Model: All main concepts are present.
>>>>      -
>>>>
>>>>      I/O: SDK supports text (Google Cloud Storage) and BigQuery
>> connectors
>>>>      and has a framework for adding additional sources and sinks.
>>>>      -
>>>>
>>>>      Runners: SDK has two pipeline runners: direct runner (in process,
>>>> local
>>>>      execution) and Cloud Dataflow runner for batch pipelines (submit job
>>>> to
>>>>      Google Dataflow service). The current direct runner is bounded only
>>>> (batch
>>>>      execution) but there is work in progress to support unbounded (as in
>>>> Java).
>>>>      -
>>>>
>>>>      Testing: The code base has unit test coverage for all the modules
>> and
>>>>      several integration and end to end tests (similar in coverage to the
>>>> Java
>>>>      SDK). Streaming is not well tested end to end yet since Cloud
>> Dataflow
>>>>      focused first on batch.
>>>>      -
>>>>
>>>>      Docs: We have matching Python documentation for the features
>> currently
>>>>      supported by Cloud Dataflow. The docs are on cloud.google.com
>> (access
>>>>      only by whitelist due to the alpha stage of the project). Devin is
>>>> working
>>>>      on the transition of all docs to Apache.
>>>>
>>>>
>>>> In the next days/weeks we would like to prepare and start migrating the
>>>> code and you should start seeing some pull requests. We also hope that
>> the
>>>> Beam community will shape the SDK going forward. In particular, all the
>>>> model improvements implemented for Java (Runner API, etc.) will have
>>>> equivalents in Python once they stabilize. If you have any advice before
>>>> we
>>>> start the journey please let us know.
>>>>
>>>> The team that will join the Beam effort consists of me (Silviu
>> Calinoiu),
>>>> Charles Chen, Ahmet Altay, Chamikara Jayalath, and last but not least
>>>> Robert Bradshaw (who is already an Apache Beam committer).
>>>>
>>>> So let us know what you think!
>>>>
>>>> Best regards,
>>>>
>>>> Silviu
>>>>
>>>>
>>> --
>>> Jean-Baptiste Onofr�
>>> jbonofre@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>
>

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Apache Beam for Python

Posted by James Malone <ja...@google.com.INVALID>.
Hey Silviu!

I think JB is proposing we create a python directory in the sdks directory
in the root repository (and modify the configuration files accordingly):

   https://github.com/apache/incubator-beam/tree/master/sdks

This Beam document here titled "Apache Beam (Incubating): Repository
Structure" details the proposed repository structure and may be useful:


https://drive.google.com/a/google.com/folderview?id=0B-IhJZh9Ab52OFBVZHpsNjc4eXc

Best,

James



On Fri, Jun 3, 2016 at 6:34 AM, Silviu Calinoiu <si...@google.com.invalid>
wrote:

> Hi JB,
> Thanks for the welcome! I come from the Python land so  I am not quite
> familiar with Maven. What do you mean by a Maven module? You mean an
> artifact so you can install things? In Python, people are used to packages
> downloaded from PyPI (pypi.python.org -- which is sort of Maven for
> Python). Whatever is the standard way of doing things in Apache we'll do
> it. Just asking for clarifications.
>
> By the way this discussion is very useful since we will have to iron out
> several details like this.
> Thanks,
> Silviu
>
> On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
>
> > Hi Silviu,
> >
> > thanks for detailed update and great work !
> >
> > I would advice to create a:
> >
> > sdks/python
> >
> > Maven module to store the Python SDK.
> >
> > WDYT ?
> >
> > By the way, welcome aboard and great to have you all guys in the team !
> >
> > Regards
> > JB
> >
> > On 06/03/2016 03:13 PM, Silviu Calinoiu wrote:
> >
> >> Hi all,
> >>
> >> My name is Silviu Calinoiu and I am a member of the Cloud Dataflow team
> >> working on the Python SDK.  As the original Beam proposal (
> >> https://wiki.apache.org/incubator/BeamProposal) mentioned, we have been
> >> planning to merge the Python SDK into Beam. The Python SDK is in an
> early
> >> stage of development (alpha milestone) and so this is a good time to
> move
> >> the code without causing too much disruption to our customers.
> >> Additionally, this enables the Beam community to contribute as soon as
> >> possible.
> >>
> >> The current state of the SDK is as follows:
> >>
> >>     -
> >>
> >>     Open-sourced at
> >> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/
> >>
> >>
> >>     -
> >>
> >>     Model: All main concepts are present.
> >>     -
> >>
> >>     I/O: SDK supports text (Google Cloud Storage) and BigQuery
> connectors
> >>     and has a framework for adding additional sources and sinks.
> >>     -
> >>
> >>     Runners: SDK has two pipeline runners: direct runner (in process,
> >> local
> >>     execution) and Cloud Dataflow runner for batch pipelines (submit job
> >> to
> >>     Google Dataflow service). The current direct runner is bounded only
> >> (batch
> >>     execution) but there is work in progress to support unbounded (as in
> >> Java).
> >>     -
> >>
> >>     Testing: The code base has unit test coverage for all the modules
> and
> >>     several integration and end to end tests (similar in coverage to the
> >> Java
> >>     SDK). Streaming is not well tested end to end yet since Cloud
> Dataflow
> >>     focused first on batch.
> >>     -
> >>
> >>     Docs: We have matching Python documentation for the features
> currently
> >>     supported by Cloud Dataflow. The docs are on cloud.google.com
> (access
> >>     only by whitelist due to the alpha stage of the project). Devin is
> >> working
> >>     on the transition of all docs to Apache.
> >>
> >>
> >> In the next days/weeks we would like to prepare and start migrating the
> >> code and you should start seeing some pull requests. We also hope that
> the
> >> Beam community will shape the SDK going forward. In particular, all the
> >> model improvements implemented for Java (Runner API, etc.) will have
> >> equivalents in Python once they stabilize. If you have any advice before
> >> we
> >> start the journey please let us know.
> >>
> >> The team that will join the Beam effort consists of me (Silviu
> Calinoiu),
> >> Charles Chen, Ahmet Altay, Chamikara Jayalath, and last but not least
> >> Robert Bradshaw (who is already an Apache Beam committer).
> >>
> >> So let us know what you think!
> >>
> >> Best regards,
> >>
> >> Silviu
> >>
> >>
> > --
> > Jean-Baptiste Onofré
> > jbonofre@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>

Re: Apache Beam for Python

Posted by Silviu Calinoiu <si...@google.com.INVALID>.
Hi JB,
Thanks for the welcome! I come from the Python land so  I am not quite
familiar with Maven. What do you mean by a Maven module? You mean an
artifact so you can install things? In Python, people are used to packages
downloaded from PyPI (pypi.python.org -- which is sort of Maven for
Python). Whatever is the standard way of doing things in Apache we'll do
it. Just asking for clarifications.

By the way this discussion is very useful since we will have to iron out
several details like this.
Thanks,
Silviu

On Fri, Jun 3, 2016 at 6:19 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Hi Silviu,
>
> thanks for detailed update and great work !
>
> I would advice to create a:
>
> sdks/python
>
> Maven module to store the Python SDK.
>
> WDYT ?
>
> By the way, welcome aboard and great to have you all guys in the team !
>
> Regards
> JB
>
> On 06/03/2016 03:13 PM, Silviu Calinoiu wrote:
>
>> Hi all,
>>
>> My name is Silviu Calinoiu and I am a member of the Cloud Dataflow team
>> working on the Python SDK.  As the original Beam proposal (
>> https://wiki.apache.org/incubator/BeamProposal) mentioned, we have been
>> planning to merge the Python SDK into Beam. The Python SDK is in an early
>> stage of development (alpha milestone) and so this is a good time to move
>> the code without causing too much disruption to our customers.
>> Additionally, this enables the Beam community to contribute as soon as
>> possible.
>>
>> The current state of the SDK is as follows:
>>
>>     -
>>
>>     Open-sourced at
>> https://github.com/GoogleCloudPlatform/DataflowPythonSDK/
>>
>>
>>     -
>>
>>     Model: All main concepts are present.
>>     -
>>
>>     I/O: SDK supports text (Google Cloud Storage) and BigQuery connectors
>>     and has a framework for adding additional sources and sinks.
>>     -
>>
>>     Runners: SDK has two pipeline runners: direct runner (in process,
>> local
>>     execution) and Cloud Dataflow runner for batch pipelines (submit job
>> to
>>     Google Dataflow service). The current direct runner is bounded only
>> (batch
>>     execution) but there is work in progress to support unbounded (as in
>> Java).
>>     -
>>
>>     Testing: The code base has unit test coverage for all the modules and
>>     several integration and end to end tests (similar in coverage to the
>> Java
>>     SDK). Streaming is not well tested end to end yet since Cloud Dataflow
>>     focused first on batch.
>>     -
>>
>>     Docs: We have matching Python documentation for the features currently
>>     supported by Cloud Dataflow. The docs are on cloud.google.com (access
>>     only by whitelist due to the alpha stage of the project). Devin is
>> working
>>     on the transition of all docs to Apache.
>>
>>
>> In the next days/weeks we would like to prepare and start migrating the
>> code and you should start seeing some pull requests. We also hope that the
>> Beam community will shape the SDK going forward. In particular, all the
>> model improvements implemented for Java (Runner API, etc.) will have
>> equivalents in Python once they stabilize. If you have any advice before
>> we
>> start the journey please let us know.
>>
>> The team that will join the Beam effort consists of me (Silviu Calinoiu),
>> Charles Chen, Ahmet Altay, Chamikara Jayalath, and last but not least
>> Robert Bradshaw (who is already an Apache Beam committer).
>>
>> So let us know what you think!
>>
>> Best regards,
>>
>> Silviu
>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Apache Beam for Python

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi Silviu,

thanks for detailed update and great work !

I would advice to create a:

sdks/python

Maven module to store the Python SDK.

WDYT ?

By the way, welcome aboard and great to have you all guys in the team !

Regards
JB

On 06/03/2016 03:13 PM, Silviu Calinoiu wrote:
> Hi all,
>
> My name is Silviu Calinoiu and I am a member of the Cloud Dataflow team
> working on the Python SDK.  As the original Beam proposal (
> https://wiki.apache.org/incubator/BeamProposal) mentioned, we have been
> planning to merge the Python SDK into Beam. The Python SDK is in an early
> stage of development (alpha milestone) and so this is a good time to move
> the code without causing too much disruption to our customers.
> Additionally, this enables the Beam community to contribute as soon as
> possible.
>
> The current state of the SDK is as follows:
>
>     -
>
>     Open-sourced at https://github.com/GoogleCloudPlatform/DataflowPythonSDK/
>
>
>     -
>
>     Model: All main concepts are present.
>     -
>
>     I/O: SDK supports text (Google Cloud Storage) and BigQuery connectors
>     and has a framework for adding additional sources and sinks.
>     -
>
>     Runners: SDK has two pipeline runners: direct runner (in process, local
>     execution) and Cloud Dataflow runner for batch pipelines (submit job to
>     Google Dataflow service). The current direct runner is bounded only (batch
>     execution) but there is work in progress to support unbounded (as in Java).
>     -
>
>     Testing: The code base has unit test coverage for all the modules and
>     several integration and end to end tests (similar in coverage to the Java
>     SDK). Streaming is not well tested end to end yet since Cloud Dataflow
>     focused first on batch.
>     -
>
>     Docs: We have matching Python documentation for the features currently
>     supported by Cloud Dataflow. The docs are on cloud.google.com (access
>     only by whitelist due to the alpha stage of the project). Devin is working
>     on the transition of all docs to Apache.
>
>
> In the next days/weeks we would like to prepare and start migrating the
> code and you should start seeing some pull requests. We also hope that the
> Beam community will shape the SDK going forward. In particular, all the
> model improvements implemented for Java (Runner API, etc.) will have
> equivalents in Python once they stabilize. If you have any advice before we
> start the journey please let us know.
>
> The team that will join the Beam effort consists of me (Silviu Calinoiu),
> Charles Chen, Ahmet Altay, Chamikara Jayalath, and last but not least
> Robert Bradshaw (who is already an Apache Beam committer).
>
> So let us know what you think!
>
> Best regards,
>
> Silviu
>

-- 
Jean-Baptiste Onofr�
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com