You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Jacob Marble <jm...@kochava.com> on 2017/11/17 00:56:43 UTC

@DoFn.Setup not called

This one is weird.

A DoFn I wrote:
- stateful
- used plenty in a streaming pipeline
- direct and dataflow runners
- works fine

Now:
- new batch pipeline
- @DoFn.Setup method not called
- direct runner works properly (logs from setup method are output)
- dataflow runner simply doesn't call the setup method

Is this possibly a Beam misuse? Javadoc for DoFn.Setup doesn't hint at
anything, so I'm suspecting Dataflow bug?

Jacob

Re: @DoFn.Setup not called

Posted by Jacob Marble <jm...@kochava.com>.
Cool! Thanks Kenn.

Jacob

On Mon, Nov 20, 2017 at 9:57 AM, Kenneth Knowles <kl...@google.com> wrote:

> I wanted to follow up that this has been reproduced and diagnosed, and a
> fix is underway. The ticket to follow is https://issues.apache.org/
> jira/browse/BEAM-3219.
>
> Kenn
>
> On Fri, Nov 17, 2017 at 12:23 PM, Jacob Marble <jm...@kochava.com>
> wrote:
>
>> Here is a small pipeline job that fails using the Dataflow runner, but
>> doesn't fail using the direct runner.
>>
>> https://gist.github.com/jacobmarble/804c2edb9c80a2863f3e671d6851a55f
>>
>> Jacob
>>
>> On Fri, Nov 17, 2017 at 9:27 AM, Kenneth Knowles <kl...@google.com> wrote:
>>
>>> It is definitely a big deal if @Setup is not getting called! There are
>>> no special cases that would skip @Setup. Please do report what you can.
>>>
>>> That said, lazily doing setup (via null check or some such as you
>>> mention) is perfectly fine and often a more robust programming pattern.
>>> Upside: you can't accidentally use uninitialized things. Downside: it might
>>> mask repeated initialization and only manifest as poor performance.
>>>
>>> Kenn
>>>
>>> On Fri, Nov 17, 2017 at 9:00 AM, Jacob Marble <jm...@kochava.com>
>>> wrote:
>>>
>>>> I tried to write a simpler DoFn that induces the error, but it works
>>>> fine. Working around the issue today by using @StartBundle with a null
>>>> check, and that seems to be working.
>>>>
>>>> If this really is a big deal, then it needs to be reported, so I'll try
>>>> to find time to write a broken example.
>>>>
>>>> Jacob
>>>>
>>>> On Thu, Nov 16, 2017 at 10:27 PM, Eugene Kirpichov <
>>>> kirpichov@google.com> wrote:
>>>>
>>>>> Could you give more details, e.g. a code snippet that reproduces the
>>>>> issue, and describe how you determine that @Setup hasn't been called?
>>>>>
>>>>> On Thu, Nov 16, 2017 at 6:58 PM Derek Hao Hu <ph...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> ​I've been using DoFn.Setup method in Dataflow and it seems to be
>>>>>> working fine.​
>>>>>>
>>>>>> On Thu, Nov 16, 2017 at 4:56 PM, Jacob Marble <jm...@kochava.com>
>>>>>> wrote:
>>>>>>
>>>>>>> This one is weird.
>>>>>>>
>>>>>>> A DoFn I wrote:
>>>>>>> - stateful
>>>>>>> - used plenty in a streaming pipeline
>>>>>>> - direct and dataflow runners
>>>>>>> - works fine
>>>>>>>
>>>>>>> Now:
>>>>>>> - new batch pipeline
>>>>>>> - @DoFn.Setup method not called
>>>>>>> - direct runner works properly (logs from setup method are output)
>>>>>>> - dataflow runner simply doesn't call the setup method
>>>>>>>
>>>>>>> Is this possibly a Beam misuse? Javadoc for DoFn.Setup doesn't hint
>>>>>>> at anything, so I'm suspecting Dataflow bug?
>>>>>>>
>>>>>>> Jacob
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Derek Hao Hu
>>>>>>
>>>>>> Software Engineer | Snapchat
>>>>>> Snap Inc.
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: @DoFn.Setup not called

Posted by Kenneth Knowles <kl...@google.com>.
I wanted to follow up that this has been reproduced and diagnosed, and a
fix is underway. The ticket to follow is
https://issues.apache.org/jira/browse/BEAM-3219.

Kenn

On Fri, Nov 17, 2017 at 12:23 PM, Jacob Marble <jm...@kochava.com> wrote:

> Here is a small pipeline job that fails using the Dataflow runner, but
> doesn't fail using the direct runner.
>
> https://gist.github.com/jacobmarble/804c2edb9c80a2863f3e671d6851a55f
>
> Jacob
>
> On Fri, Nov 17, 2017 at 9:27 AM, Kenneth Knowles <kl...@google.com> wrote:
>
>> It is definitely a big deal if @Setup is not getting called! There are no
>> special cases that would skip @Setup. Please do report what you can.
>>
>> That said, lazily doing setup (via null check or some such as you
>> mention) is perfectly fine and often a more robust programming pattern.
>> Upside: you can't accidentally use uninitialized things. Downside: it might
>> mask repeated initialization and only manifest as poor performance.
>>
>> Kenn
>>
>> On Fri, Nov 17, 2017 at 9:00 AM, Jacob Marble <jm...@kochava.com>
>> wrote:
>>
>>> I tried to write a simpler DoFn that induces the error, but it works
>>> fine. Working around the issue today by using @StartBundle with a null
>>> check, and that seems to be working.
>>>
>>> If this really is a big deal, then it needs to be reported, so I'll try
>>> to find time to write a broken example.
>>>
>>> Jacob
>>>
>>> On Thu, Nov 16, 2017 at 10:27 PM, Eugene Kirpichov <kirpichov@google.com
>>> > wrote:
>>>
>>>> Could you give more details, e.g. a code snippet that reproduces the
>>>> issue, and describe how you determine that @Setup hasn't been called?
>>>>
>>>> On Thu, Nov 16, 2017 at 6:58 PM Derek Hao Hu <ph...@gmail.com>
>>>> wrote:
>>>>
>>>>> ​I've been using DoFn.Setup method in Dataflow and it seems to be
>>>>> working fine.​
>>>>>
>>>>> On Thu, Nov 16, 2017 at 4:56 PM, Jacob Marble <jm...@kochava.com>
>>>>> wrote:
>>>>>
>>>>>> This one is weird.
>>>>>>
>>>>>> A DoFn I wrote:
>>>>>> - stateful
>>>>>> - used plenty in a streaming pipeline
>>>>>> - direct and dataflow runners
>>>>>> - works fine
>>>>>>
>>>>>> Now:
>>>>>> - new batch pipeline
>>>>>> - @DoFn.Setup method not called
>>>>>> - direct runner works properly (logs from setup method are output)
>>>>>> - dataflow runner simply doesn't call the setup method
>>>>>>
>>>>>> Is this possibly a Beam misuse? Javadoc for DoFn.Setup doesn't hint
>>>>>> at anything, so I'm suspecting Dataflow bug?
>>>>>>
>>>>>> Jacob
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Derek Hao Hu
>>>>>
>>>>> Software Engineer | Snapchat
>>>>> Snap Inc.
>>>>>
>>>>
>>>
>>
>

Re: @DoFn.Setup not called

Posted by Kenneth Knowles <kl...@google.com>.
On Fri, Nov 17, 2017 at 8:38 PM, Jacob Marble <jm...@kochava.com> wrote:

> I also notice that stateful DoFn's seem to only be instantiated once in
> Dataflow, but multiple instances do end up being created in the direct
> runner. Is there a story behind that?
>

The runner is free to instantiate a DoFn as often as it likes, but
efficiency-oriented runners will do it as infrequently as possible. It
isn't only required to discard an instance when an error has occurred and
the instance state is presumed to be corrupted.

Kenn


>
> Jacob
>
> On Fri, Nov 17, 2017 at 7:22 PM, Jacob Marble <jm...@kochava.com> wrote:
>
>> Noticing some related and unexpected differences between batch and
>> streaming pipelines.
>>
>> Why does a stateful DoFn behave like GroupByKey (no data output until all
>> data input is complete) in a batch pipeline, but not in a streaming
>> pipeline? It looks like BatchStatefulParDoOverrides has something to do
>> with it, but I can't figure out why, or how to work around it.
>>
>> In this current project:
>> 1) read 500 million elements
>> 2) very slow stateful DoFn (rate limited API calls)
>> 3) write results
>>
>> To complete step 1 in a reasonable time, multiple workers are required,
>> but Dataflow's autoscaling doesn't reduce the worker quantity when step 1
>> completes. Since step 2 doesn't speed up with more workers, it would be
>> best if it could start as soon as step 1 starts. This way, the job
>> completes faster and uses fewer resources.
>>
>> Jacob
>>
>> On Fri, Nov 17, 2017 at 12:23 PM, Jacob Marble <jm...@kochava.com>
>> wrote:
>>
>>> Here is a small pipeline job that fails using the Dataflow runner, but
>>> doesn't fail using the direct runner.
>>>
>>> https://gist.github.com/jacobmarble/804c2edb9c80a2863f3e671d6851a55f
>>>
>>> Jacob
>>>
>>> On Fri, Nov 17, 2017 at 9:27 AM, Kenneth Knowles <kl...@google.com> wrote:
>>>
>>>> It is definitely a big deal if @Setup is not getting called! There are
>>>> no special cases that would skip @Setup. Please do report what you can.
>>>>
>>>> That said, lazily doing setup (via null check or some such as you
>>>> mention) is perfectly fine and often a more robust programming pattern.
>>>> Upside: you can't accidentally use uninitialized things. Downside: it might
>>>> mask repeated initialization and only manifest as poor performance.
>>>>
>>>> Kenn
>>>>
>>>> On Fri, Nov 17, 2017 at 9:00 AM, Jacob Marble <jm...@kochava.com>
>>>> wrote:
>>>>
>>>>> I tried to write a simpler DoFn that induces the error, but it works
>>>>> fine. Working around the issue today by using @StartBundle with a null
>>>>> check, and that seems to be working.
>>>>>
>>>>> If this really is a big deal, then it needs to be reported, so I'll
>>>>> try to find time to write a broken example.
>>>>>
>>>>> Jacob
>>>>>
>>>>> On Thu, Nov 16, 2017 at 10:27 PM, Eugene Kirpichov <
>>>>> kirpichov@google.com> wrote:
>>>>>
>>>>>> Could you give more details, e.g. a code snippet that reproduces the
>>>>>> issue, and describe how you determine that @Setup hasn't been called?
>>>>>>
>>>>>> On Thu, Nov 16, 2017 at 6:58 PM Derek Hao Hu <ph...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> ​I've been using DoFn.Setup method in Dataflow and it seems to be
>>>>>>> working fine.​
>>>>>>>
>>>>>>> On Thu, Nov 16, 2017 at 4:56 PM, Jacob Marble <jm...@kochava.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> This one is weird.
>>>>>>>>
>>>>>>>> A DoFn I wrote:
>>>>>>>> - stateful
>>>>>>>> - used plenty in a streaming pipeline
>>>>>>>> - direct and dataflow runners
>>>>>>>> - works fine
>>>>>>>>
>>>>>>>> Now:
>>>>>>>> - new batch pipeline
>>>>>>>> - @DoFn.Setup method not called
>>>>>>>> - direct runner works properly (logs from setup method are output)
>>>>>>>> - dataflow runner simply doesn't call the setup method
>>>>>>>>
>>>>>>>> Is this possibly a Beam misuse? Javadoc for DoFn.Setup doesn't hint
>>>>>>>> at anything, so I'm suspecting Dataflow bug?
>>>>>>>>
>>>>>>>> Jacob
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Derek Hao Hu
>>>>>>>
>>>>>>> Software Engineer | Snapchat
>>>>>>> Snap Inc.
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: @DoFn.Setup not called

Posted by Jacob Marble <jm...@kochava.com>.
I also notice that stateful DoFn's seem to only be instantiated once in
Dataflow, but multiple instances do end up being created in the direct
runner. Is there a story behind that?

Jacob

On Fri, Nov 17, 2017 at 7:22 PM, Jacob Marble <jm...@kochava.com> wrote:

> Noticing some related and unexpected differences between batch and
> streaming pipelines.
>
> Why does a stateful DoFn behave like GroupByKey (no data output until all
> data input is complete) in a batch pipeline, but not in a streaming
> pipeline? It looks like BatchStatefulParDoOverrides has something to do
> with it, but I can't figure out why, or how to work around it.
>
> In this current project:
> 1) read 500 million elements
> 2) very slow stateful DoFn (rate limited API calls)
> 3) write results
>
> To complete step 1 in a reasonable time, multiple workers are required,
> but Dataflow's autoscaling doesn't reduce the worker quantity when step 1
> completes. Since step 2 doesn't speed up with more workers, it would be
> best if it could start as soon as step 1 starts. This way, the job
> completes faster and uses fewer resources.
>
> Jacob
>
> On Fri, Nov 17, 2017 at 12:23 PM, Jacob Marble <jm...@kochava.com>
> wrote:
>
>> Here is a small pipeline job that fails using the Dataflow runner, but
>> doesn't fail using the direct runner.
>>
>> https://gist.github.com/jacobmarble/804c2edb9c80a2863f3e671d6851a55f
>>
>> Jacob
>>
>> On Fri, Nov 17, 2017 at 9:27 AM, Kenneth Knowles <kl...@google.com> wrote:
>>
>>> It is definitely a big deal if @Setup is not getting called! There are
>>> no special cases that would skip @Setup. Please do report what you can.
>>>
>>> That said, lazily doing setup (via null check or some such as you
>>> mention) is perfectly fine and often a more robust programming pattern.
>>> Upside: you can't accidentally use uninitialized things. Downside: it might
>>> mask repeated initialization and only manifest as poor performance.
>>>
>>> Kenn
>>>
>>> On Fri, Nov 17, 2017 at 9:00 AM, Jacob Marble <jm...@kochava.com>
>>> wrote:
>>>
>>>> I tried to write a simpler DoFn that induces the error, but it works
>>>> fine. Working around the issue today by using @StartBundle with a null
>>>> check, and that seems to be working.
>>>>
>>>> If this really is a big deal, then it needs to be reported, so I'll try
>>>> to find time to write a broken example.
>>>>
>>>> Jacob
>>>>
>>>> On Thu, Nov 16, 2017 at 10:27 PM, Eugene Kirpichov <
>>>> kirpichov@google.com> wrote:
>>>>
>>>>> Could you give more details, e.g. a code snippet that reproduces the
>>>>> issue, and describe how you determine that @Setup hasn't been called?
>>>>>
>>>>> On Thu, Nov 16, 2017 at 6:58 PM Derek Hao Hu <ph...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> ​I've been using DoFn.Setup method in Dataflow and it seems to be
>>>>>> working fine.​
>>>>>>
>>>>>> On Thu, Nov 16, 2017 at 4:56 PM, Jacob Marble <jm...@kochava.com>
>>>>>> wrote:
>>>>>>
>>>>>>> This one is weird.
>>>>>>>
>>>>>>> A DoFn I wrote:
>>>>>>> - stateful
>>>>>>> - used plenty in a streaming pipeline
>>>>>>> - direct and dataflow runners
>>>>>>> - works fine
>>>>>>>
>>>>>>> Now:
>>>>>>> - new batch pipeline
>>>>>>> - @DoFn.Setup method not called
>>>>>>> - direct runner works properly (logs from setup method are output)
>>>>>>> - dataflow runner simply doesn't call the setup method
>>>>>>>
>>>>>>> Is this possibly a Beam misuse? Javadoc for DoFn.Setup doesn't hint
>>>>>>> at anything, so I'm suspecting Dataflow bug?
>>>>>>>
>>>>>>> Jacob
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Derek Hao Hu
>>>>>>
>>>>>> Software Engineer | Snapchat
>>>>>> Snap Inc.
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: @DoFn.Setup not called

Posted by Jacob Marble <jm...@kochava.com>.
Noticing some related and unexpected differences between batch and
streaming pipelines.

Why does a stateful DoFn behave like GroupByKey (no data output until all
data input is complete) in a batch pipeline, but not in a streaming
pipeline? It looks like BatchStatefulParDoOverrides has something to do
with it, but I can't figure out why, or how to work around it.

In this current project:
1) read 500 million elements
2) very slow stateful DoFn (rate limited API calls)
3) write results

To complete step 1 in a reasonable time, multiple workers are required, but
Dataflow's autoscaling doesn't reduce the worker quantity when step 1
completes. Since step 2 doesn't speed up with more workers, it would be
best if it could start as soon as step 1 starts. This way, the job
completes faster and uses fewer resources.

Jacob

On Fri, Nov 17, 2017 at 12:23 PM, Jacob Marble <jm...@kochava.com> wrote:

> Here is a small pipeline job that fails using the Dataflow runner, but
> doesn't fail using the direct runner.
>
> https://gist.github.com/jacobmarble/804c2edb9c80a2863f3e671d6851a55f
>
> Jacob
>
> On Fri, Nov 17, 2017 at 9:27 AM, Kenneth Knowles <kl...@google.com> wrote:
>
>> It is definitely a big deal if @Setup is not getting called! There are no
>> special cases that would skip @Setup. Please do report what you can.
>>
>> That said, lazily doing setup (via null check or some such as you
>> mention) is perfectly fine and often a more robust programming pattern.
>> Upside: you can't accidentally use uninitialized things. Downside: it might
>> mask repeated initialization and only manifest as poor performance.
>>
>> Kenn
>>
>> On Fri, Nov 17, 2017 at 9:00 AM, Jacob Marble <jm...@kochava.com>
>> wrote:
>>
>>> I tried to write a simpler DoFn that induces the error, but it works
>>> fine. Working around the issue today by using @StartBundle with a null
>>> check, and that seems to be working.
>>>
>>> If this really is a big deal, then it needs to be reported, so I'll try
>>> to find time to write a broken example.
>>>
>>> Jacob
>>>
>>> On Thu, Nov 16, 2017 at 10:27 PM, Eugene Kirpichov <kirpichov@google.com
>>> > wrote:
>>>
>>>> Could you give more details, e.g. a code snippet that reproduces the
>>>> issue, and describe how you determine that @Setup hasn't been called?
>>>>
>>>> On Thu, Nov 16, 2017 at 6:58 PM Derek Hao Hu <ph...@gmail.com>
>>>> wrote:
>>>>
>>>>> ​I've been using DoFn.Setup method in Dataflow and it seems to be
>>>>> working fine.​
>>>>>
>>>>> On Thu, Nov 16, 2017 at 4:56 PM, Jacob Marble <jm...@kochava.com>
>>>>> wrote:
>>>>>
>>>>>> This one is weird.
>>>>>>
>>>>>> A DoFn I wrote:
>>>>>> - stateful
>>>>>> - used plenty in a streaming pipeline
>>>>>> - direct and dataflow runners
>>>>>> - works fine
>>>>>>
>>>>>> Now:
>>>>>> - new batch pipeline
>>>>>> - @DoFn.Setup method not called
>>>>>> - direct runner works properly (logs from setup method are output)
>>>>>> - dataflow runner simply doesn't call the setup method
>>>>>>
>>>>>> Is this possibly a Beam misuse? Javadoc for DoFn.Setup doesn't hint
>>>>>> at anything, so I'm suspecting Dataflow bug?
>>>>>>
>>>>>> Jacob
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Derek Hao Hu
>>>>>
>>>>> Software Engineer | Snapchat
>>>>> Snap Inc.
>>>>>
>>>>
>>>
>>
>

Re: @DoFn.Setup not called

Posted by Jacob Marble <jm...@kochava.com>.
Here is a small pipeline job that fails using the Dataflow runner, but
doesn't fail using the direct runner.

https://gist.github.com/jacobmarble/804c2edb9c80a2863f3e671d6851a55f

Jacob

On Fri, Nov 17, 2017 at 9:27 AM, Kenneth Knowles <kl...@google.com> wrote:

> It is definitely a big deal if @Setup is not getting called! There are no
> special cases that would skip @Setup. Please do report what you can.
>
> That said, lazily doing setup (via null check or some such as you mention)
> is perfectly fine and often a more robust programming pattern. Upside: you
> can't accidentally use uninitialized things. Downside: it might mask
> repeated initialization and only manifest as poor performance.
>
> Kenn
>
> On Fri, Nov 17, 2017 at 9:00 AM, Jacob Marble <jm...@kochava.com> wrote:
>
>> I tried to write a simpler DoFn that induces the error, but it works
>> fine. Working around the issue today by using @StartBundle with a null
>> check, and that seems to be working.
>>
>> If this really is a big deal, then it needs to be reported, so I'll try
>> to find time to write a broken example.
>>
>> Jacob
>>
>> On Thu, Nov 16, 2017 at 10:27 PM, Eugene Kirpichov <ki...@google.com>
>> wrote:
>>
>>> Could you give more details, e.g. a code snippet that reproduces the
>>> issue, and describe how you determine that @Setup hasn't been called?
>>>
>>> On Thu, Nov 16, 2017 at 6:58 PM Derek Hao Hu <ph...@gmail.com>
>>> wrote:
>>>
>>>> ​I've been using DoFn.Setup method in Dataflow and it seems to be
>>>> working fine.​
>>>>
>>>> On Thu, Nov 16, 2017 at 4:56 PM, Jacob Marble <jm...@kochava.com>
>>>> wrote:
>>>>
>>>>> This one is weird.
>>>>>
>>>>> A DoFn I wrote:
>>>>> - stateful
>>>>> - used plenty in a streaming pipeline
>>>>> - direct and dataflow runners
>>>>> - works fine
>>>>>
>>>>> Now:
>>>>> - new batch pipeline
>>>>> - @DoFn.Setup method not called
>>>>> - direct runner works properly (logs from setup method are output)
>>>>> - dataflow runner simply doesn't call the setup method
>>>>>
>>>>> Is this possibly a Beam misuse? Javadoc for DoFn.Setup doesn't hint at
>>>>> anything, so I'm suspecting Dataflow bug?
>>>>>
>>>>> Jacob
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Derek Hao Hu
>>>>
>>>> Software Engineer | Snapchat
>>>> Snap Inc.
>>>>
>>>
>>
>

Re: @DoFn.Setup not called

Posted by Kenneth Knowles <kl...@google.com>.
It is definitely a big deal if @Setup is not getting called! There are no
special cases that would skip @Setup. Please do report what you can.

That said, lazily doing setup (via null check or some such as you mention)
is perfectly fine and often a more robust programming pattern. Upside: you
can't accidentally use uninitialized things. Downside: it might mask
repeated initialization and only manifest as poor performance.

Kenn

On Fri, Nov 17, 2017 at 9:00 AM, Jacob Marble <jm...@kochava.com> wrote:

> I tried to write a simpler DoFn that induces the error, but it works fine.
> Working around the issue today by using @StartBundle with a null check, and
> that seems to be working.
>
> If this really is a big deal, then it needs to be reported, so I'll try to
> find time to write a broken example.
>
> Jacob
>
> On Thu, Nov 16, 2017 at 10:27 PM, Eugene Kirpichov <ki...@google.com>
> wrote:
>
>> Could you give more details, e.g. a code snippet that reproduces the
>> issue, and describe how you determine that @Setup hasn't been called?
>>
>> On Thu, Nov 16, 2017 at 6:58 PM Derek Hao Hu <ph...@gmail.com>
>> wrote:
>>
>>> ​I've been using DoFn.Setup method in Dataflow and it seems to be
>>> working fine.​
>>>
>>> On Thu, Nov 16, 2017 at 4:56 PM, Jacob Marble <jm...@kochava.com>
>>> wrote:
>>>
>>>> This one is weird.
>>>>
>>>> A DoFn I wrote:
>>>> - stateful
>>>> - used plenty in a streaming pipeline
>>>> - direct and dataflow runners
>>>> - works fine
>>>>
>>>> Now:
>>>> - new batch pipeline
>>>> - @DoFn.Setup method not called
>>>> - direct runner works properly (logs from setup method are output)
>>>> - dataflow runner simply doesn't call the setup method
>>>>
>>>> Is this possibly a Beam misuse? Javadoc for DoFn.Setup doesn't hint at
>>>> anything, so I'm suspecting Dataflow bug?
>>>>
>>>> Jacob
>>>>
>>>
>>>
>>>
>>> --
>>> Derek Hao Hu
>>>
>>> Software Engineer | Snapchat
>>> Snap Inc.
>>>
>>
>

Re: @DoFn.Setup not called

Posted by Jacob Marble <jm...@kochava.com>.
I tried to write a simpler DoFn that induces the error, but it works fine.
Working around the issue today by using @StartBundle with a null check, and
that seems to be working.

If this really is a big deal, then it needs to be reported, so I'll try to
find time to write a broken example.

Jacob

On Thu, Nov 16, 2017 at 10:27 PM, Eugene Kirpichov <ki...@google.com>
wrote:

> Could you give more details, e.g. a code snippet that reproduces the
> issue, and describe how you determine that @Setup hasn't been called?
>
> On Thu, Nov 16, 2017 at 6:58 PM Derek Hao Hu <ph...@gmail.com>
> wrote:
>
>> ​I've been using DoFn.Setup method in Dataflow and it seems to be working
>> fine.​
>>
>> On Thu, Nov 16, 2017 at 4:56 PM, Jacob Marble <jm...@kochava.com>
>> wrote:
>>
>>> This one is weird.
>>>
>>> A DoFn I wrote:
>>> - stateful
>>> - used plenty in a streaming pipeline
>>> - direct and dataflow runners
>>> - works fine
>>>
>>> Now:
>>> - new batch pipeline
>>> - @DoFn.Setup method not called
>>> - direct runner works properly (logs from setup method are output)
>>> - dataflow runner simply doesn't call the setup method
>>>
>>> Is this possibly a Beam misuse? Javadoc for DoFn.Setup doesn't hint at
>>> anything, so I'm suspecting Dataflow bug?
>>>
>>> Jacob
>>>
>>
>>
>>
>> --
>> Derek Hao Hu
>>
>> Software Engineer | Snapchat
>> Snap Inc.
>>
>

Re: @DoFn.Setup not called

Posted by Eugene Kirpichov <ki...@google.com>.
Could you give more details, e.g. a code snippet that reproduces the issue,
and describe how you determine that @Setup hasn't been called?

On Thu, Nov 16, 2017 at 6:58 PM Derek Hao Hu <ph...@gmail.com> wrote:

> ​I've been using DoFn.Setup method in Dataflow and it seems to be working
> fine.​
>
> On Thu, Nov 16, 2017 at 4:56 PM, Jacob Marble <jm...@kochava.com> wrote:
>
>> This one is weird.
>>
>> A DoFn I wrote:
>> - stateful
>> - used plenty in a streaming pipeline
>> - direct and dataflow runners
>> - works fine
>>
>> Now:
>> - new batch pipeline
>> - @DoFn.Setup method not called
>> - direct runner works properly (logs from setup method are output)
>> - dataflow runner simply doesn't call the setup method
>>
>> Is this possibly a Beam misuse? Javadoc for DoFn.Setup doesn't hint at
>> anything, so I'm suspecting Dataflow bug?
>>
>> Jacob
>>
>
>
>
> --
> Derek Hao Hu
>
> Software Engineer | Snapchat
> Snap Inc.
>

Re: @DoFn.Setup not called

Posted by Derek Hao Hu <ph...@gmail.com>.
​I've been using DoFn.Setup method in Dataflow and it seems to be working
fine.​

On Thu, Nov 16, 2017 at 4:56 PM, Jacob Marble <jm...@kochava.com> wrote:

> This one is weird.
>
> A DoFn I wrote:
> - stateful
> - used plenty in a streaming pipeline
> - direct and dataflow runners
> - works fine
>
> Now:
> - new batch pipeline
> - @DoFn.Setup method not called
> - direct runner works properly (logs from setup method are output)
> - dataflow runner simply doesn't call the setup method
>
> Is this possibly a Beam misuse? Javadoc for DoFn.Setup doesn't hint at
> anything, so I'm suspecting Dataflow bug?
>
> Jacob
>



-- 
Derek Hao Hu

Software Engineer | Snapchat
Snap Inc.