You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Taranbir Wraich <ta...@napkyn.com> on 2020/02/27 21:57:42 UTC

Issue in Dataflow runner for apache beam - Python SDK

Hi Team,

I am writing to get a better understanding of an issue I am facing with the
Apache Beam - python .

I am running a python dataflow pipeline which reads in data from big query
then, collates the data in a single pandas data frame and then processes. I
did not have any issues while testing using the Direct runner, but ran into
some unexpected issues while using the Dataflow Runner. I have posted a
question regarding the same at Stack overflow. I was advised to contact you
guys for better insight into this.

I am adding the link for the question. I would really appreciate if someone
could help me figure out this issue.

<https://stackoverflow.com/questions/60437931/python-dataflow-dofn-class-
function-finish-bundle-running-multiple-times-and-giv>









Thanks and Regards

Taranbir Wraich

Cloud Analyst

P: 613-366-7881 ext 13X

P: 888-243-4619 ext 13X

E: [taranbir.wraich@napkyn.com](mailto:taranbir.wraich@napkyn.com)

![](cid:image001.png@01D584E2.54F3D940)



![](cid:image002.png@01D584E2.54F3D940)

Napkyn Inc. | 888.243.4619 |[ Twitter](https://twitter.com/napkyninc) |[
LinkedIn](https://www.linkedin.com/company/napkyn-inc.) |[
napkyn.com](https://www.napkyn.com/)

This email, including any attachments, is for the sole use of the intended
recipient and may contain confidential information. If you are not the
intended recipient, please immediately notify us by reply email or by
telephone, delete this email and destroy any copies. Thank you.



  
![](https://goo.gl/xQogeG)

Napkyn Inc. | 888.243.4619 | [Twitter ](https://twitter.com/napkyninc)|
[LinkedIn](https://www.linkedin.com/company/napkyn-inc.) |
[napkyn.com](https://www.napkyn.com/)

This email, including any attachments, is for the sole use of the intended
recipient and may contain confidential information. If you are not the
intended recipient, please immediately notify us by reply email or by
telephone, delete this email and destroy any copies. Thank you.

  


Re: Issue in Dataflow runner for apache beam - Python SDK

Posted by Kyle Weaver <kc...@google.com>.
Hi Taranbir,

I posted an answer to the Stack Overflow question. The summary is that you
should use a CombineFn instead of a plain DoFn.

Hope that helps.

Kyle

On Thu, Feb 27, 2020 at 2:25 PM Taranbir Wraich <ta...@napkyn.com>
wrote:

> Hi Team,
>
> I am writing to get a better understanding of an issue I am facing with
> the Apache Beam - python .
>
> I am running a python dataflow pipeline which reads in data from big query
> then, collates the data in a single pandas data frame and then processes. I
> did not have any issues while testing using the Direct runner, but ran into
> some unexpected issues while using the Dataflow Runner. I have posted a
> question regarding the same at Stack overflow. I was advised to contact you
> guys for better insight into this.
>
> I am adding the link for the question. I would really appreciate if
> someone could help me figure out this issue.
>
>
> https://stackoverflow.com/questions/60437931/python-dataflow-dofn-class-function-finish-bundle-running-multiple-times-and-giv
>
>
>
>
>
>
>
>
>
> Thanks and Regards
>
> Taranbir Wraich
>
> Cloud Analyst
>
> P: 613-366-7881 <(613)%20366-7881> ext 13X
>
> P: 888-243-4619 <(888)%20243-4619> ext 13X
>
> E: taranbir.wraich@napkyn.com
>
>
>
> Napkyn Inc. | 888.243.4619 <(888)%20243-4619> | Twitter
> <https://twitter.com/napkyninc> | LinkedIn
> <https://www.linkedin.com/company/napkyn-inc.> | napkyn.com
> <https://www.napkyn.com/>
>
> This email, including any attachments, is for the sole use of the intended
> recipient and may contain confidential information. If you are not the
> intended recipient, please immediately notify us by reply email or by
> telephone, delete this email and destroy any copies. Thank you.
>
>
>
> Napkyn Inc. | 888.243.4619 <(888)%20243-4619> | Twitter
> <https://twitter.com/napkyninc>| LinkedIn
> <https://www.linkedin.com/company/napkyn-inc.> | napkyn.com
> <https://www.napkyn.com/>
>
> This email, including any attachments, is for the sole use of the intended
> recipient and may contain confidential information. If you are not the
> intended recipient, please immediately notify us by reply email or by
> telephone, delete this email and destroy any copies. Thank you.
>
>