You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Jacob Marble <jm...@kochava.com> on 2017/11/19 16:17:11 UTC

unique DoFn id

Is there a recommended way to get a unique id for each instance of a DoFn?

- DataflowWorkerHarnessOptions.getWorkerId() only returns a unique id per
worker, which can contain multiple instances of a DoFn.
- Looks like ThreadLocalRandom is seeded with the same value on every
instance
- Thinking I'll try workerId + construction timestamp next

Jacob

Re: unique DoFn id

Posted by Jacob Marble <jm...@kochava.com>.
That helps, thanks.

On Sun, Nov 19, 2017 at 7:29 PM Eugene Kirpichov <ki...@google.com>
wrote:

> That's correct. DoFns are serialized in the pipeline description and
> shipped to workers and deserialized there. Standard Java serialization is
> used, and Java serialization doesn't call the constructor - it directly
> creates an instance of the class (even if it doesn't declare a default
> constructor) and repopulates fields.
>
> On Sun, Nov 19, 2017, 7:07 PM Jacob Marble <jm...@kochava.com> wrote:
>
>> Eugene, that worked. Can you explain why this doesn't work when I set the
>> UUID (or random value) from the constructor?
>>
>> It looks like the DoFn constructor is called once by the worker, then
>> that constructed object is copied as many times as needed, each instance
>> getting it's own thread and @Setup,@StartBundle,@etc loop. Is that correct?
>>
>> Thanks for the help.
>>
>> Jacob
>>
>> On Sun, Nov 19, 2017 at 10:24 AM, Eugene Kirpichov <ki...@google.com>
>> wrote:
>>
>>> You could create a private variable with a UUID, filled in in @Setup or
>>> (if you're hitting that bug where @Setup wasn't being called) in
>>> readObject()?
>>>
>>> On Sun, Nov 19, 2017 at 8:17 AM Jacob Marble <jm...@kochava.com>
>>> wrote:
>>>
>>>> Is there a recommended way to get a unique id for each instance of a
>>>> DoFn?
>>>>
>>>> - DataflowWorkerHarnessOptions.getWorkerId() only returns a unique id
>>>> per worker, which can contain multiple instances of a DoFn.
>>>> - Looks like ThreadLocalRandom is seeded with the same value on every
>>>> instance
>>>> - Thinking I'll try workerId + construction timestamp next
>>>>
>>>> Jacob
>>>>
>>>
>> --
Jacob

Re: unique DoFn id

Posted by Eugene Kirpichov <ki...@google.com>.
That's correct. DoFns are serialized in the pipeline description and
shipped to workers and deserialized there. Standard Java serialization is
used, and Java serialization doesn't call the constructor - it directly
creates an instance of the class (even if it doesn't declare a default
constructor) and repopulates fields.

On Sun, Nov 19, 2017, 7:07 PM Jacob Marble <jm...@kochava.com> wrote:

> Eugene, that worked. Can you explain why this doesn't work when I set the
> UUID (or random value) from the constructor?
>
> It looks like the DoFn constructor is called once by the worker, then that
> constructed object is copied as many times as needed, each instance getting
> it's own thread and @Setup,@StartBundle,@etc loop. Is that correct?
>
> Thanks for the help.
>
> Jacob
>
> On Sun, Nov 19, 2017 at 10:24 AM, Eugene Kirpichov <ki...@google.com>
> wrote:
>
>> You could create a private variable with a UUID, filled in in @Setup or
>> (if you're hitting that bug where @Setup wasn't being called) in
>> readObject()?
>>
>> On Sun, Nov 19, 2017 at 8:17 AM Jacob Marble <jm...@kochava.com> wrote:
>>
>>> Is there a recommended way to get a unique id for each instance of a
>>> DoFn?
>>>
>>> - DataflowWorkerHarnessOptions.getWorkerId() only returns a unique id
>>> per worker, which can contain multiple instances of a DoFn.
>>> - Looks like ThreadLocalRandom is seeded with the same value on every
>>> instance
>>> - Thinking I'll try workerId + construction timestamp next
>>>
>>> Jacob
>>>
>>
>

Re: unique DoFn id

Posted by Jacob Marble <jm...@kochava.com>.
Eugene, that worked. Can you explain why this doesn't work when I set the
UUID (or random value) from the constructor?

It looks like the DoFn constructor is called once by the worker, then that
constructed object is copied as many times as needed, each instance getting
it's own thread and @Setup,@StartBundle,@etc loop. Is that correct?

Thanks for the help.

Jacob

On Sun, Nov 19, 2017 at 10:24 AM, Eugene Kirpichov <ki...@google.com>
wrote:

> You could create a private variable with a UUID, filled in in @Setup or
> (if you're hitting that bug where @Setup wasn't being called) in
> readObject()?
>
> On Sun, Nov 19, 2017 at 8:17 AM Jacob Marble <jm...@kochava.com> wrote:
>
>> Is there a recommended way to get a unique id for each instance of a
>> DoFn?
>>
>> - DataflowWorkerHarnessOptions.getWorkerId() only returns a unique id
>> per worker, which can contain multiple instances of a DoFn.
>> - Looks like ThreadLocalRandom is seeded with the same value on every
>> instance
>> - Thinking I'll try workerId + construction timestamp next
>>
>> Jacob
>>
>

Re: unique DoFn id

Posted by Eugene Kirpichov <ki...@google.com>.
You could create a private variable with a UUID, filled in in @Setup or (if
you're hitting that bug where @Setup wasn't being called) in readObject()?

On Sun, Nov 19, 2017 at 8:17 AM Jacob Marble <jm...@kochava.com> wrote:

> Is there a recommended way to get a unique id for each instance of a DoFn?
>
> - DataflowWorkerHarnessOptions.getWorkerId() only returns a unique id per
> worker, which can contain multiple instances of a DoFn.
> - Looks like ThreadLocalRandom is seeded with the same value on every
> instance
> - Thinking I'll try workerId + construction timestamp next
>
> Jacob
>