You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mxnet.apache.org by kellen sunderland <ke...@gmail.com> on 2018/01/08 17:27:17 UTC

[DISCUSS] Seeding and determinism on multi-gpu systems.

Hello MXNet devs,

I wanted to see what people thought about the follow section of code, which
I think has some subtle pros/cons:
https://github.com/apache/incubator-mxnet/blob/d2a856a3a2abb4e72edc301b8b821f0b75f30722/src/resource.cc#L188

Tobi (tdomhan) from sockeye pointed it out to me after he spent some time
debugging non-determinism in his model training.

This functionality is well documented here:
https://mxnet.incubator.apache.org/api/python/ndarray.html#mxnet.random.seed
but I don't think the current api meets all use cases due to this section:

"Random number generators in MXNet are device specific. Therefore, random
numbers generated from two devices can be different even if they are seeded
using the same seed."

I'm guessing this is a feature that makes distributed training easier in
MXNet, you wouldn't want to train the same model on each GPU.  However the
downside of this is that if you run unit tests on a multi-gpu system, or in
a training environment where you don't have control over which GPU you use,
you can't count on deterministic behaviour which you can assert results
against.  I have a feeling there are non-unit test use cases where you'd
also want deterministic behaviour independent of which gpu you happen to
have your code scheduled to run on.

How do others feel about this?  Would it make sense to have some optional
args in the seed call to have the seed-per-device functionality turned off?

-Kellen

Re: [DISCUSS] Seeding and determinism on multi-gpu systems.

Posted by "Domhan, Tobias" <do...@amazon.de>.

thanks for bringing this topic up, Kellen!

It would indeed be great if we could make sure to have reproducible results indepent of the concrete device id someone is using.

Just a thought: How about decoupling the seeding from the concrete device ids while still allowing a different seed for each one of the devices, by deterministically mapping the device_ids to an internal id and basing the device seed of that, similar to the following pseudo-code:

set_seed([6,8,10], global_seed=42) 

def set_seed(devices, global_seed):
  for device_idx, device in enumerate(devices):
    set_device_seed(device=device, seed=device_idx + global_seed * magic)

This way results would be consistent as long as you use the same setup, namely the same number of devices, but would be independent of the concrete device ids.

What do you think?


Also regarding distributed training: The device_id could be identical across machines, so even at the moment one needs to make sure to correctly set the global seed to a different value on each host in order to not have the same random number generator on all of the machines.


Tobi

On 1/9/18, 5:50 PM, "Chris Olivier" <cj...@gmail.com> wrote:

    This is what I was asking about:
    https://www.unix.com/man-page/POSIX/3posix/random/
    
    POSIX standard stating that given a seed, the output must be deterministic.
    
    On Tue, Jan 9, 2018 at 7:58 AM, kellen sunderland <
    kellen.sunderland@gmail.com> wrote:
    
    > Sorry if I'm misunderstanding your question here Chris.
    >
    > On Tue, Jan 9, 2018 at 4:58 PM, kellen sunderland <
    > kellen.sunderland@gmail.com> wrote:
    >
    > > I think the convention is that random generators in most modern languages
    > > are always seeded, and always deterministic.  If a user seed isn't
    > > supplied, implementations generally provide their own seed, which they
    > > attempt to make unique.  Often they generate a seed that takes into
    > account
    > > the current time.  This is at least the case for many mainstream
    > languages.
    > >
    > > Java implementation: https://docs.oracle.com/javase/8/docs/api/
    > > java/util/Random.html
    > > Remarks: "If two instances of Random are created with the same seed, and
    > > the same sequence of method calls is made for each, they will generate
    > and
    > > return identical sequences of numbers."
    > >
    > > C#: https://msdn.microsoft.com/en-us/library/ctssatww(v=vs.110).aspx
    > > Remarks: "Providing an identical seed value to different Random objects
    > > causes each instance to produce identical sequences of random numbers.
    > This
    > > is often done when testing apps that rely on random number generators."
    > >
    > > On Tue, Jan 9, 2018 at 4:27 PM, Chris Olivier <cj...@gmail.com>
    > > wrote:
    > >
    > >> wait wait — i don’t think that random number generators should return
    > >> deterministic lists of numbers. i’m asking if something says it’s
    > supposed
    > >> to. i know they tend to, but my understanding is that they tend to
    > because
    > >> of the challenge of generating true random numbers from hardware.  IMHO
    > >> the
    > >> ideal random number generator would not return a determinaiticnset if
    > >> numbers regardless of seed.
    > >>
    > >> On Tue, Jan 9, 2018 at 3:43 AM Pedro Larroy <
    > pedro.larroy.lists@gmail.com
    > >> >
    > >> wrote:
    > >>
    > >> > For enabling parallel deterministic testing we can set an environment
    > >> > variable and set the same seed on different devices for those cases
    > >> > where we want it, leaving the default as it is. I think this would be
    > >> > an easy solution that wouldn't change any behaviour in training on
    > >> > multi-gpu.
    > >> >
    > >> > On Tue, Jan 9, 2018 at 10:48 AM, kellen sunderland
    > >> > <ke...@gmail.com> wrote:
    > >> > > Thanks Asmus, yes this is also the approach I would be in favour of.
    > >> I
    > >> > > think we should optionally allow the user to specify if they want
    > >> > > deterministic behaviour independent of the GPU they run on.  If
    > MXNet
    > >> is
    > >> > > going to support more arbitrary linear algabra operations I could
    > see
    > >> a
    > >> > lot
    > >> > > of use cases for this.  For example I want deterministic noise fed
    > >> into a
    > >> > > deep-RL simulation so that I can compare a few different algorithms
    > >> > without
    > >> > > variance, and do it in parallel on my machine (that happens to have
    > >> two
    > >> > > GPUs).
    > >> > >
    > >> > > On Tue, Jan 9, 2018 at 10:36 AM, Asmus Hetzel
    > >> > <as...@yahoo.de.invalid>
    > >> > > wrote:
    > >> > >
    > >> > >>  The issue is tricky. Number generators should return deterministic
    > >> sets
    > >> > >> of numbers as Chris said, but that usually only applies to
    > >> > non-distributed
    > >> > >> systems. And to some extend, we have already a distributed system
    > as
    > >> > soon
    > >> > >> as one cpu and one gpu is involved.
    > >> > >> For the usual setup like distributed training, using different
    > seeds
    > >> on
    > >> > >> different devices is a must. You distribute a process that involves
    > >> > random
    > >> > >> number generation and that means that you absolutely have to ensure
    > >> that
    > >> > >> the sequences on the devices do not correlate. So this behaviour is
    > >> > >> intended and correct. We also can not guarantee that random number
    > >> > >> generation is deterministic when running on CPU vs. running on GPU.
    > >> > >> So what we are dealing here is generating repeatable results, when
    > >> the
    > >> > >> application/code section is running on a single GPU out of a bigger
    > >> set
    > >> > of
    > >> > >> available GPUs, but we do not have control on which one. The
    > crucial
    > >> > line
    > >> > >> in mxnet is this one (resource.cc):
    > >> > >>
    > >> > >> const uint32_t seed = ctx.dev_id + i * kMaxNumGPUs + global_seed *
    > >> > >> kRandMagic;
    > >> > >> Here I think it would make sense to add a switch that optionally
    > >> makes
    > >> > >> this setting independent of ctx.dev_id. But we would have to
    > document
    > >> > >> really well that this is solely meant for specific types of
    > >> > debugging/unit
    > >> > >> testing.
    > >> > >>
    > >> > >>
    > >> > >>
    > >> > >>
    > >> > >>
    > >> > >>
    > >> > >>
    > >> > >>
    > >> > >>     Am Montag, 8. Januar 2018, 19:30:02 MEZ hat Chris Olivier <
    > >> > >> cjolivier01@gmail.com> Folgendes geschrieben:
    > >> > >>
    > >> > >>  Is it explicitly defined somewhere that random number generators
    > >> should
    > >> > >> always return a deterministic set of numbers given the same seed,
    > or
    > >> is
    > >> > >> that just a side-effect of some hardware not having a better way to
    > >> > >> generate random numbers so they use a user-defined seed to kick off
    > >> the
    > >> > >> randomization starting point?
    > >> > >>
    > >> > >> On Mon, Jan 8, 2018 at 9:27 AM, kellen sunderland <
    > >> > >> kellen.sunderland@gmail.com> wrote:
    > >> > >>
    > >> > >> > Hello MXNet devs,
    > >> > >> >
    > >> > >> > I wanted to see what people thought about the follow section of
    > >> code,
    > >> > >> which
    > >> > >> > I think has some subtle pros/cons:
    > >> > >> > https://github.com/apache/incubator-mxnet/blob/
    > >> > >> > d2a856a3a2abb4e72edc301b8b821f0b75f30722/src/resource.cc#L188
    > >> > >> >
    > >> > >> > Tobi (tdomhan) from sockeye pointed it out to me after he spent
    > >> some
    > >> > time
    > >> > >> > debugging non-determinism in his model training.
    > >> > >> >
    > >> > >> > This functionality is well documented here:
    > >> > >> > https://mxnet.incubator.apache.org/api/python/ndarray.
    > >> > >> > html#mxnet.random.seed
    > >> > >> > but I don't think the current api meets all use cases due to this
    > >> > >> section:
    > >> > >> >
    > >> > >> > "Random number generators in MXNet are device specific.
    > Therefore,
    > >> > random
    > >> > >> > numbers generated from two devices can be different even if they
    > >> are
    > >> > >> seeded
    > >> > >> > using the same seed."
    > >> > >> >
    > >> > >> > I'm guessing this is a feature that makes distributed training
    > >> easier
    > >> > in
    > >> > >> > MXNet, you wouldn't want to train the same model on each GPU.
    > >> However
    > >> > >> the
    > >> > >> > downside of this is that if you run unit tests on a multi-gpu
    > >> system,
    > >> > or
    > >> > >> in
    > >> > >> > a training environment where you don't have control over which
    > GPU
    > >> you
    > >> > >> use,
    > >> > >> > you can't count on deterministic behaviour which you can assert
    > >> > results
    > >> > >> > against.  I have a feeling there are non-unit test use cases
    > where
    > >> > you'd
    > >> > >> > also want deterministic behaviour independent of which gpu you
    > >> happen
    > >> > to
    > >> > >> > have your code scheduled to run on.
    > >> > >> >
    > >> > >> > How do others feel about this?  Would it make sense to have some
    > >> > optional
    > >> > >> > args in the seed call to have the seed-per-device functionality
    > >> turned
    > >> > >> off?
    > >> > >> >
    > >> > >> > -Kellen
    > >> > >> >
    > >> > >>
    > >> > >>
    > >> >
    > >>
    > >
    > >
    >
    

Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

Re: [DISCUSS] Seeding and determinism on multi-gpu systems.

Posted by Chris Olivier <cj...@gmail.com>.

This is what I was asking about:
https://www.unix.com/man-page/POSIX/3posix/random/

POSIX standard stating that given a seed, the output must be deterministic.

On Tue, Jan 9, 2018 at 7:58 AM, kellen sunderland <
kellen.sunderland@gmail.com> wrote:

> Sorry if I'm misunderstanding your question here Chris.
>
> On Tue, Jan 9, 2018 at 4:58 PM, kellen sunderland <
> kellen.sunderland@gmail.com> wrote:
>
> > I think the convention is that random generators in most modern languages
> > are always seeded, and always deterministic.  If a user seed isn't
> > supplied, implementations generally provide their own seed, which they
> > attempt to make unique.  Often they generate a seed that takes into
> account
> > the current time.  This is at least the case for many mainstream
> languages.
> >
> > Java implementation: https://docs.oracle.com/javase/8/docs/api/
> > java/util/Random.html
> > Remarks: "If two instances of Random are created with the same seed, and
> > the same sequence of method calls is made for each, they will generate
> and
> > return identical sequences of numbers."
> >
> > C#: https://msdn.microsoft.com/en-us/library/ctssatww(v=vs.110).aspx
> > Remarks: "Providing an identical seed value to different Random objects
> > causes each instance to produce identical sequences of random numbers.
> This
> > is often done when testing apps that rely on random number generators."
> >
> > On Tue, Jan 9, 2018 at 4:27 PM, Chris Olivier <cj...@gmail.com>
> > wrote:
> >
> >> wait wait — i don’t think that random number generators should return
> >> deterministic lists of numbers. i’m asking if something says it’s
> supposed
> >> to. i know they tend to, but my understanding is that they tend to
> because
> >> of the challenge of generating true random numbers from hardware.  IMHO
> >> the
> >> ideal random number generator would not return a determinaiticnset if
> >> numbers regardless of seed.
> >>
> >> On Tue, Jan 9, 2018 at 3:43 AM Pedro Larroy <
> pedro.larroy.lists@gmail.com
> >> >
> >> wrote:
> >>
> >> > For enabling parallel deterministic testing we can set an environment
> >> > variable and set the same seed on different devices for those cases
> >> > where we want it, leaving the default as it is. I think this would be
> >> > an easy solution that wouldn't change any behaviour in training on
> >> > multi-gpu.
> >> >
> >> > On Tue, Jan 9, 2018 at 10:48 AM, kellen sunderland
> >> > <ke...@gmail.com> wrote:
> >> > > Thanks Asmus, yes this is also the approach I would be in favour of.
> >> I
> >> > > think we should optionally allow the user to specify if they want
> >> > > deterministic behaviour independent of the GPU they run on.  If
> MXNet
> >> is
> >> > > going to support more arbitrary linear algabra operations I could
> see
> >> a
> >> > lot
> >> > > of use cases for this.  For example I want deterministic noise fed
> >> into a
> >> > > deep-RL simulation so that I can compare a few different algorithms
> >> > without
> >> > > variance, and do it in parallel on my machine (that happens to have
> >> two
> >> > > GPUs).
> >> > >
> >> > > On Tue, Jan 9, 2018 at 10:36 AM, Asmus Hetzel
> >> > <as...@yahoo.de.invalid>
> >> > > wrote:
> >> > >
> >> > >>  The issue is tricky. Number generators should return deterministic
> >> sets
> >> > >> of numbers as Chris said, but that usually only applies to
> >> > non-distributed
> >> > >> systems. And to some extend, we have already a distributed system
> as
> >> > soon
> >> > >> as one cpu and one gpu is involved.
> >> > >> For the usual setup like distributed training, using different
> seeds
> >> on
> >> > >> different devices is a must. You distribute a process that involves
> >> > random
> >> > >> number generation and that means that you absolutely have to ensure
> >> that
> >> > >> the sequences on the devices do not correlate. So this behaviour is
> >> > >> intended and correct. We also can not guarantee that random number
> >> > >> generation is deterministic when running on CPU vs. running on GPU.
> >> > >> So what we are dealing here is generating repeatable results, when
> >> the
> >> > >> application/code section is running on a single GPU out of a bigger
> >> set
> >> > of
> >> > >> available GPUs, but we do not have control on which one. The
> crucial
> >> > line
> >> > >> in mxnet is this one (resource.cc):
> >> > >>
> >> > >> const uint32_t seed = ctx.dev_id + i * kMaxNumGPUs + global_seed *
> >> > >> kRandMagic;
> >> > >> Here I think it would make sense to add a switch that optionally
> >> makes
> >> > >> this setting independent of ctx.dev_id. But we would have to
> document
> >> > >> really well that this is solely meant for specific types of
> >> > debugging/unit
> >> > >> testing.
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>     Am Montag, 8. Januar 2018, 19:30:02 MEZ hat Chris Olivier <
> >> > >> cjolivier01@gmail.com> Folgendes geschrieben:
> >> > >>
> >> > >>  Is it explicitly defined somewhere that random number generators
> >> should
> >> > >> always return a deterministic set of numbers given the same seed,
> or
> >> is
> >> > >> that just a side-effect of some hardware not having a better way to
> >> > >> generate random numbers so they use a user-defined seed to kick off
> >> the
> >> > >> randomization starting point?
> >> > >>
> >> > >> On Mon, Jan 8, 2018 at 9:27 AM, kellen sunderland <
> >> > >> kellen.sunderland@gmail.com> wrote:
> >> > >>
> >> > >> > Hello MXNet devs,
> >> > >> >
> >> > >> > I wanted to see what people thought about the follow section of
> >> code,
> >> > >> which
> >> > >> > I think has some subtle pros/cons:
> >> > >> > https://github.com/apache/incubator-mxnet/blob/
> >> > >> > d2a856a3a2abb4e72edc301b8b821f0b75f30722/src/resource.cc#L188
> >> > >> >
> >> > >> > Tobi (tdomhan) from sockeye pointed it out to me after he spent
> >> some
> >> > time
> >> > >> > debugging non-determinism in his model training.
> >> > >> >
> >> > >> > This functionality is well documented here:
> >> > >> > https://mxnet.incubator.apache.org/api/python/ndarray.
> >> > >> > html#mxnet.random.seed
> >> > >> > but I don't think the current api meets all use cases due to this
> >> > >> section:
> >> > >> >
> >> > >> > "Random number generators in MXNet are device specific.
> Therefore,
> >> > random
> >> > >> > numbers generated from two devices can be different even if they
> >> are
> >> > >> seeded
> >> > >> > using the same seed."
> >> > >> >
> >> > >> > I'm guessing this is a feature that makes distributed training
> >> easier
> >> > in
> >> > >> > MXNet, you wouldn't want to train the same model on each GPU.
> >> However
> >> > >> the
> >> > >> > downside of this is that if you run unit tests on a multi-gpu
> >> system,
> >> > or
> >> > >> in
> >> > >> > a training environment where you don't have control over which
> GPU
> >> you
> >> > >> use,
> >> > >> > you can't count on deterministic behaviour which you can assert
> >> > results
> >> > >> > against.  I have a feeling there are non-unit test use cases
> where
> >> > you'd
> >> > >> > also want deterministic behaviour independent of which gpu you
> >> happen
> >> > to
> >> > >> > have your code scheduled to run on.
> >> > >> >
> >> > >> > How do others feel about this?  Would it make sense to have some
> >> > optional
> >> > >> > args in the seed call to have the seed-per-device functionality
> >> turned
> >> > >> off?
> >> > >> >
> >> > >> > -Kellen
> >> > >> >
> >> > >>
> >> > >>
> >> >
> >>
> >
> >
>

Re: [DISCUSS] Seeding and determinism on multi-gpu systems.

Posted by kellen sunderland <ke...@gmail.com>.

Sorry if I'm misunderstanding your question here Chris.

On Tue, Jan 9, 2018 at 4:58 PM, kellen sunderland <
kellen.sunderland@gmail.com> wrote:

> I think the convention is that random generators in most modern languages
> are always seeded, and always deterministic.  If a user seed isn't
> supplied, implementations generally provide their own seed, which they
> attempt to make unique.  Often they generate a seed that takes into account
> the current time.  This is at least the case for many mainstream languages.
>
> Java implementation: https://docs.oracle.com/javase/8/docs/api/
> java/util/Random.html
> Remarks: "If two instances of Random are created with the same seed, and
> the same sequence of method calls is made for each, they will generate and
> return identical sequences of numbers."
>
> C#: https://msdn.microsoft.com/en-us/library/ctssatww(v=vs.110).aspx
> Remarks: "Providing an identical seed value to different Random objects
> causes each instance to produce identical sequences of random numbers. This
> is often done when testing apps that rely on random number generators."
>
> On Tue, Jan 9, 2018 at 4:27 PM, Chris Olivier <cj...@gmail.com>
> wrote:
>
>> wait wait — i don’t think that random number generators should return
>> deterministic lists of numbers. i’m asking if something says it’s supposed
>> to. i know they tend to, but my understanding is that they tend to because
>> of the challenge of generating true random numbers from hardware.  IMHO
>> the
>> ideal random number generator would not return a determinaiticnset if
>> numbers regardless of seed.
>>
>> On Tue, Jan 9, 2018 at 3:43 AM Pedro Larroy <pedro.larroy.lists@gmail.com
>> >
>> wrote:
>>
>> > For enabling parallel deterministic testing we can set an environment
>> > variable and set the same seed on different devices for those cases
>> > where we want it, leaving the default as it is. I think this would be
>> > an easy solution that wouldn't change any behaviour in training on
>> > multi-gpu.
>> >
>> > On Tue, Jan 9, 2018 at 10:48 AM, kellen sunderland
>> > <ke...@gmail.com> wrote:
>> > > Thanks Asmus, yes this is also the approach I would be in favour of.
>> I
>> > > think we should optionally allow the user to specify if they want
>> > > deterministic behaviour independent of the GPU they run on.  If MXNet
>> is
>> > > going to support more arbitrary linear algabra operations I could see
>> a
>> > lot
>> > > of use cases for this.  For example I want deterministic noise fed
>> into a
>> > > deep-RL simulation so that I can compare a few different algorithms
>> > without
>> > > variance, and do it in parallel on my machine (that happens to have
>> two
>> > > GPUs).
>> > >
>> > > On Tue, Jan 9, 2018 at 10:36 AM, Asmus Hetzel
>> > <as...@yahoo.de.invalid>
>> > > wrote:
>> > >
>> > >>  The issue is tricky. Number generators should return deterministic
>> sets
>> > >> of numbers as Chris said, but that usually only applies to
>> > non-distributed
>> > >> systems. And to some extend, we have already a distributed system as
>> > soon
>> > >> as one cpu and one gpu is involved.
>> > >> For the usual setup like distributed training, using different seeds
>> on
>> > >> different devices is a must. You distribute a process that involves
>> > random
>> > >> number generation and that means that you absolutely have to ensure
>> that
>> > >> the sequences on the devices do not correlate. So this behaviour is
>> > >> intended and correct. We also can not guarantee that random number
>> > >> generation is deterministic when running on CPU vs. running on GPU.
>> > >> So what we are dealing here is generating repeatable results, when
>> the
>> > >> application/code section is running on a single GPU out of a bigger
>> set
>> > of
>> > >> available GPUs, but we do not have control on which one. The crucial
>> > line
>> > >> in mxnet is this one (resource.cc):
>> > >>
>> > >> const uint32_t seed = ctx.dev_id + i * kMaxNumGPUs + global_seed *
>> > >> kRandMagic;
>> > >> Here I think it would make sense to add a switch that optionally
>> makes
>> > >> this setting independent of ctx.dev_id. But we would have to document
>> > >> really well that this is solely meant for specific types of
>> > debugging/unit
>> > >> testing.
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>     Am Montag, 8. Januar 2018, 19:30:02 MEZ hat Chris Olivier <
>> > >> cjolivier01@gmail.com> Folgendes geschrieben:
>> > >>
>> > >>  Is it explicitly defined somewhere that random number generators
>> should
>> > >> always return a deterministic set of numbers given the same seed, or
>> is
>> > >> that just a side-effect of some hardware not having a better way to
>> > >> generate random numbers so they use a user-defined seed to kick off
>> the
>> > >> randomization starting point?
>> > >>
>> > >> On Mon, Jan 8, 2018 at 9:27 AM, kellen sunderland <
>> > >> kellen.sunderland@gmail.com> wrote:
>> > >>
>> > >> > Hello MXNet devs,
>> > >> >
>> > >> > I wanted to see what people thought about the follow section of
>> code,
>> > >> which
>> > >> > I think has some subtle pros/cons:
>> > >> > https://github.com/apache/incubator-mxnet/blob/
>> > >> > d2a856a3a2abb4e72edc301b8b821f0b75f30722/src/resource.cc#L188
>> > >> >
>> > >> > Tobi (tdomhan) from sockeye pointed it out to me after he spent
>> some
>> > time
>> > >> > debugging non-determinism in his model training.
>> > >> >
>> > >> > This functionality is well documented here:
>> > >> > https://mxnet.incubator.apache.org/api/python/ndarray.
>> > >> > html#mxnet.random.seed
>> > >> > but I don't think the current api meets all use cases due to this
>> > >> section:
>> > >> >
>> > >> > "Random number generators in MXNet are device specific. Therefore,
>> > random
>> > >> > numbers generated from two devices can be different even if they
>> are
>> > >> seeded
>> > >> > using the same seed."
>> > >> >
>> > >> > I'm guessing this is a feature that makes distributed training
>> easier
>> > in
>> > >> > MXNet, you wouldn't want to train the same model on each GPU.
>> However
>> > >> the
>> > >> > downside of this is that if you run unit tests on a multi-gpu
>> system,
>> > or
>> > >> in
>> > >> > a training environment where you don't have control over which GPU
>> you
>> > >> use,
>> > >> > you can't count on deterministic behaviour which you can assert
>> > results
>> > >> > against.  I have a feeling there are non-unit test use cases where
>> > you'd
>> > >> > also want deterministic behaviour independent of which gpu you
>> happen
>> > to
>> > >> > have your code scheduled to run on.
>> > >> >
>> > >> > How do others feel about this?  Would it make sense to have some
>> > optional
>> > >> > args in the seed call to have the seed-per-device functionality
>> turned
>> > >> off?
>> > >> >
>> > >> > -Kellen
>> > >> >
>> > >>
>> > >>
>> >
>>
>
>

Re: [DISCUSS] Seeding and determinism on multi-gpu systems.

Posted by kellen sunderland <ke...@gmail.com>.

I think the convention is that random generators in most modern languages
are always seeded, and always deterministic.  If a user seed isn't
supplied, implementations generally provide their own seed, which they
attempt to make unique.  Often they generate a seed that takes into account
the current time.  This is at least the case for many mainstream languages.

Java implementation:
https://docs.oracle.com/javase/8/docs/api/java/util/Random.html
Remarks: "If two instances of Random are created with the same seed, and
the same sequence of method calls is made for each, they will generate and
return identical sequences of numbers."

C#: https://msdn.microsoft.com/en-us/library/ctssatww(v=vs.110).aspx
Remarks: "Providing an identical seed value to different Random objects
causes each instance to produce identical sequences of random numbers. This
is often done when testing apps that rely on random number generators."

On Tue, Jan 9, 2018 at 4:27 PM, Chris Olivier <cj...@gmail.com> wrote:

> wait wait — i don’t think that random number generators should return
> deterministic lists of numbers. i’m asking if something says it’s supposed
> to. i know they tend to, but my understanding is that they tend to because
> of the challenge of generating true random numbers from hardware.  IMHO the
> ideal random number generator would not return a determinaiticnset if
> numbers regardless of seed.
>
> On Tue, Jan 9, 2018 at 3:43 AM Pedro Larroy <pe...@gmail.com>
> wrote:
>
> > For enabling parallel deterministic testing we can set an environment
> > variable and set the same seed on different devices for those cases
> > where we want it, leaving the default as it is. I think this would be
> > an easy solution that wouldn't change any behaviour in training on
> > multi-gpu.
> >
> > On Tue, Jan 9, 2018 at 10:48 AM, kellen sunderland
> > <ke...@gmail.com> wrote:
> > > Thanks Asmus, yes this is also the approach I would be in favour of.  I
> > > think we should optionally allow the user to specify if they want
> > > deterministic behaviour independent of the GPU they run on.  If MXNet
> is
> > > going to support more arbitrary linear algabra operations I could see a
> > lot
> > > of use cases for this.  For example I want deterministic noise fed
> into a
> > > deep-RL simulation so that I can compare a few different algorithms
> > without
> > > variance, and do it in parallel on my machine (that happens to have two
> > > GPUs).
> > >
> > > On Tue, Jan 9, 2018 at 10:36 AM, Asmus Hetzel
> > <as...@yahoo.de.invalid>
> > > wrote:
> > >
> > >>  The issue is tricky. Number generators should return deterministic
> sets
> > >> of numbers as Chris said, but that usually only applies to
> > non-distributed
> > >> systems. And to some extend, we have already a distributed system as
> > soon
> > >> as one cpu and one gpu is involved.
> > >> For the usual setup like distributed training, using different seeds
> on
> > >> different devices is a must. You distribute a process that involves
> > random
> > >> number generation and that means that you absolutely have to ensure
> that
> > >> the sequences on the devices do not correlate. So this behaviour is
> > >> intended and correct. We also can not guarantee that random number
> > >> generation is deterministic when running on CPU vs. running on GPU.
> > >> So what we are dealing here is generating repeatable results, when the
> > >> application/code section is running on a single GPU out of a bigger
> set
> > of
> > >> available GPUs, but we do not have control on which one. The crucial
> > line
> > >> in mxnet is this one (resource.cc):
> > >>
> > >> const uint32_t seed = ctx.dev_id + i * kMaxNumGPUs + global_seed *
> > >> kRandMagic;
> > >> Here I think it would make sense to add a switch that optionally makes
> > >> this setting independent of ctx.dev_id. But we would have to document
> > >> really well that this is solely meant for specific types of
> > debugging/unit
> > >> testing.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>     Am Montag, 8. Januar 2018, 19:30:02 MEZ hat Chris Olivier <
> > >> cjolivier01@gmail.com> Folgendes geschrieben:
> > >>
> > >>  Is it explicitly defined somewhere that random number generators
> should
> > >> always return a deterministic set of numbers given the same seed, or
> is
> > >> that just a side-effect of some hardware not having a better way to
> > >> generate random numbers so they use a user-defined seed to kick off
> the
> > >> randomization starting point?
> > >>
> > >> On Mon, Jan 8, 2018 at 9:27 AM, kellen sunderland <
> > >> kellen.sunderland@gmail.com> wrote:
> > >>
> > >> > Hello MXNet devs,
> > >> >
> > >> > I wanted to see what people thought about the follow section of
> code,
> > >> which
> > >> > I think has some subtle pros/cons:
> > >> > https://github.com/apache/incubator-mxnet/blob/
> > >> > d2a856a3a2abb4e72edc301b8b821f0b75f30722/src/resource.cc#L188
> > >> >
> > >> > Tobi (tdomhan) from sockeye pointed it out to me after he spent some
> > time
> > >> > debugging non-determinism in his model training.
> > >> >
> > >> > This functionality is well documented here:
> > >> > https://mxnet.incubator.apache.org/api/python/ndarray.
> > >> > html#mxnet.random.seed
> > >> > but I don't think the current api meets all use cases due to this
> > >> section:
> > >> >
> > >> > "Random number generators in MXNet are device specific. Therefore,
> > random
> > >> > numbers generated from two devices can be different even if they are
> > >> seeded
> > >> > using the same seed."
> > >> >
> > >> > I'm guessing this is a feature that makes distributed training
> easier
> > in
> > >> > MXNet, you wouldn't want to train the same model on each GPU.
> However
> > >> the
> > >> > downside of this is that if you run unit tests on a multi-gpu
> system,
> > or
> > >> in
> > >> > a training environment where you don't have control over which GPU
> you
> > >> use,
> > >> > you can't count on deterministic behaviour which you can assert
> > results
> > >> > against.  I have a feeling there are non-unit test use cases where
> > you'd
> > >> > also want deterministic behaviour independent of which gpu you
> happen
> > to
> > >> > have your code scheduled to run on.
> > >> >
> > >> > How do others feel about this?  Would it make sense to have some
> > optional
> > >> > args in the seed call to have the seed-per-device functionality
> turned
> > >> off?
> > >> >
> > >> > -Kellen
> > >> >
> > >>
> > >>
> >
>

Re: [DISCUSS] Seeding and determinism on multi-gpu systems.

Posted by Chris Olivier <cj...@gmail.com>.

wait wait — i don’t think that random number generators should return
deterministic lists of numbers. i’m asking if something says it’s supposed
to. i know they tend to, but my understanding is that they tend to because
of the challenge of generating true random numbers from hardware.  IMHO the
ideal random number generator would not return a determinaiticnset if
numbers regardless of seed.

On Tue, Jan 9, 2018 at 3:43 AM Pedro Larroy <pe...@gmail.com>
wrote:

> For enabling parallel deterministic testing we can set an environment
> variable and set the same seed on different devices for those cases
> where we want it, leaving the default as it is. I think this would be
> an easy solution that wouldn't change any behaviour in training on
> multi-gpu.
>
> On Tue, Jan 9, 2018 at 10:48 AM, kellen sunderland
> <ke...@gmail.com> wrote:
> > Thanks Asmus, yes this is also the approach I would be in favour of.  I
> > think we should optionally allow the user to specify if they want
> > deterministic behaviour independent of the GPU they run on.  If MXNet is
> > going to support more arbitrary linear algabra operations I could see a
> lot
> > of use cases for this.  For example I want deterministic noise fed into a
> > deep-RL simulation so that I can compare a few different algorithms
> without
> > variance, and do it in parallel on my machine (that happens to have two
> > GPUs).
> >
> > On Tue, Jan 9, 2018 at 10:36 AM, Asmus Hetzel
> <as...@yahoo.de.invalid>
> > wrote:
> >
> >>  The issue is tricky. Number generators should return deterministic sets
> >> of numbers as Chris said, but that usually only applies to
> non-distributed
> >> systems. And to some extend, we have already a distributed system as
> soon
> >> as one cpu and one gpu is involved.
> >> For the usual setup like distributed training, using different seeds on
> >> different devices is a must. You distribute a process that involves
> random
> >> number generation and that means that you absolutely have to ensure that
> >> the sequences on the devices do not correlate. So this behaviour is
> >> intended and correct. We also can not guarantee that random number
> >> generation is deterministic when running on CPU vs. running on GPU.
> >> So what we are dealing here is generating repeatable results, when the
> >> application/code section is running on a single GPU out of a bigger set
> of
> >> available GPUs, but we do not have control on which one. The crucial
> line
> >> in mxnet is this one (resource.cc):
> >>
> >> const uint32_t seed = ctx.dev_id + i * kMaxNumGPUs + global_seed *
> >> kRandMagic;
> >> Here I think it would make sense to add a switch that optionally makes
> >> this setting independent of ctx.dev_id. But we would have to document
> >> really well that this is solely meant for specific types of
> debugging/unit
> >> testing.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>     Am Montag, 8. Januar 2018, 19:30:02 MEZ hat Chris Olivier <
> >> cjolivier01@gmail.com> Folgendes geschrieben:
> >>
> >>  Is it explicitly defined somewhere that random number generators should
> >> always return a deterministic set of numbers given the same seed, or is
> >> that just a side-effect of some hardware not having a better way to
> >> generate random numbers so they use a user-defined seed to kick off the
> >> randomization starting point?
> >>
> >> On Mon, Jan 8, 2018 at 9:27 AM, kellen sunderland <
> >> kellen.sunderland@gmail.com> wrote:
> >>
> >> > Hello MXNet devs,
> >> >
> >> > I wanted to see what people thought about the follow section of code,
> >> which
> >> > I think has some subtle pros/cons:
> >> > https://github.com/apache/incubator-mxnet/blob/
> >> > d2a856a3a2abb4e72edc301b8b821f0b75f30722/src/resource.cc#L188
> >> >
> >> > Tobi (tdomhan) from sockeye pointed it out to me after he spent some
> time
> >> > debugging non-determinism in his model training.
> >> >
> >> > This functionality is well documented here:
> >> > https://mxnet.incubator.apache.org/api/python/ndarray.
> >> > html#mxnet.random.seed
> >> > but I don't think the current api meets all use cases due to this
> >> section:
> >> >
> >> > "Random number generators in MXNet are device specific. Therefore,
> random
> >> > numbers generated from two devices can be different even if they are
> >> seeded
> >> > using the same seed."
> >> >
> >> > I'm guessing this is a feature that makes distributed training easier
> in
> >> > MXNet, you wouldn't want to train the same model on each GPU.  However
> >> the
> >> > downside of this is that if you run unit tests on a multi-gpu system,
> or
> >> in
> >> > a training environment where you don't have control over which GPU you
> >> use,
> >> > you can't count on deterministic behaviour which you can assert
> results
> >> > against.  I have a feeling there are non-unit test use cases where
> you'd
> >> > also want deterministic behaviour independent of which gpu you happen
> to
> >> > have your code scheduled to run on.
> >> >
> >> > How do others feel about this?  Would it make sense to have some
> optional
> >> > args in the seed call to have the seed-per-device functionality turned
> >> off?
> >> >
> >> > -Kellen
> >> >
> >>
> >>
>

Re: [DISCUSS] Seeding and determinism on multi-gpu systems.

Posted by Pedro Larroy <pe...@gmail.com>.

For enabling parallel deterministic testing we can set an environment
variable and set the same seed on different devices for those cases
where we want it, leaving the default as it is. I think this would be
an easy solution that wouldn't change any behaviour in training on
multi-gpu.

On Tue, Jan 9, 2018 at 10:48 AM, kellen sunderland
<ke...@gmail.com> wrote:
> Thanks Asmus, yes this is also the approach I would be in favour of.  I
> think we should optionally allow the user to specify if they want
> deterministic behaviour independent of the GPU they run on.  If MXNet is
> going to support more arbitrary linear algabra operations I could see a lot
> of use cases for this.  For example I want deterministic noise fed into a
> deep-RL simulation so that I can compare a few different algorithms without
> variance, and do it in parallel on my machine (that happens to have two
> GPUs).
>
> On Tue, Jan 9, 2018 at 10:36 AM, Asmus Hetzel <as...@yahoo.de.invalid>
> wrote:
>
>>  The issue is tricky. Number generators should return deterministic sets
>> of numbers as Chris said, but that usually only applies to non-distributed
>> systems. And to some extend, we have already a distributed system as soon
>> as one cpu and one gpu is involved.
>> For the usual setup like distributed training, using different seeds on
>> different devices is a must. You distribute a process that involves random
>> number generation and that means that you absolutely have to ensure that
>> the sequences on the devices do not correlate. So this behaviour is
>> intended and correct. We also can not guarantee that random number
>> generation is deterministic when running on CPU vs. running on GPU.
>> So what we are dealing here is generating repeatable results, when the
>> application/code section is running on a single GPU out of a bigger set of
>> available GPUs, but we do not have control on which one. The crucial line
>> in mxnet is this one (resource.cc):
>>
>> const uint32_t seed = ctx.dev_id + i * kMaxNumGPUs + global_seed *
>> kRandMagic;
>> Here I think it would make sense to add a switch that optionally makes
>> this setting independent of ctx.dev_id. But we would have to document
>> really well that this is solely meant for specific types of debugging/unit
>> testing.
>>
>>
>>
>>
>>
>>
>>
>>
>>     Am Montag, 8. Januar 2018, 19:30:02 MEZ hat Chris Olivier <
>> cjolivier01@gmail.com> Folgendes geschrieben:
>>
>>  Is it explicitly defined somewhere that random number generators should
>> always return a deterministic set of numbers given the same seed, or is
>> that just a side-effect of some hardware not having a better way to
>> generate random numbers so they use a user-defined seed to kick off the
>> randomization starting point?
>>
>> On Mon, Jan 8, 2018 at 9:27 AM, kellen sunderland <
>> kellen.sunderland@gmail.com> wrote:
>>
>> > Hello MXNet devs,
>> >
>> > I wanted to see what people thought about the follow section of code,
>> which
>> > I think has some subtle pros/cons:
>> > https://github.com/apache/incubator-mxnet/blob/
>> > d2a856a3a2abb4e72edc301b8b821f0b75f30722/src/resource.cc#L188
>> >
>> > Tobi (tdomhan) from sockeye pointed it out to me after he spent some time
>> > debugging non-determinism in his model training.
>> >
>> > This functionality is well documented here:
>> > https://mxnet.incubator.apache.org/api/python/ndarray.
>> > html#mxnet.random.seed
>> > but I don't think the current api meets all use cases due to this
>> section:
>> >
>> > "Random number generators in MXNet are device specific. Therefore, random
>> > numbers generated from two devices can be different even if they are
>> seeded
>> > using the same seed."
>> >
>> > I'm guessing this is a feature that makes distributed training easier in
>> > MXNet, you wouldn't want to train the same model on each GPU.  However
>> the
>> > downside of this is that if you run unit tests on a multi-gpu system, or
>> in
>> > a training environment where you don't have control over which GPU you
>> use,
>> > you can't count on deterministic behaviour which you can assert results
>> > against.  I have a feeling there are non-unit test use cases where you'd
>> > also want deterministic behaviour independent of which gpu you happen to
>> > have your code scheduled to run on.
>> >
>> > How do others feel about this?  Would it make sense to have some optional
>> > args in the seed call to have the seed-per-device functionality turned
>> off?
>> >
>> > -Kellen
>> >
>>
>>

Re: [DISCUSS] Seeding and determinism on multi-gpu systems.

Posted by kellen sunderland <ke...@gmail.com>.

Thanks Asmus, yes this is also the approach I would be in favour of.  I
think we should optionally allow the user to specify if they want
deterministic behaviour independent of the GPU they run on.  If MXNet is
going to support more arbitrary linear algabra operations I could see a lot
of use cases for this.  For example I want deterministic noise fed into a
deep-RL simulation so that I can compare a few different algorithms without
variance, and do it in parallel on my machine (that happens to have two
GPUs).

On Tue, Jan 9, 2018 at 10:36 AM, Asmus Hetzel <as...@yahoo.de.invalid>
wrote:

>  The issue is tricky. Number generators should return deterministic sets
> of numbers as Chris said, but that usually only applies to non-distributed
> systems. And to some extend, we have already a distributed system as soon
> as one cpu and one gpu is involved.
> For the usual setup like distributed training, using different seeds on
> different devices is a must. You distribute a process that involves random
> number generation and that means that you absolutely have to ensure that
> the sequences on the devices do not correlate. So this behaviour is
> intended and correct. We also can not guarantee that random number
> generation is deterministic when running on CPU vs. running on GPU.
> So what we are dealing here is generating repeatable results, when the
> application/code section is running on a single GPU out of a bigger set of
> available GPUs, but we do not have control on which one. The crucial line
> in mxnet is this one (resource.cc):
>
> const uint32_t seed = ctx.dev_id + i * kMaxNumGPUs + global_seed *
> kRandMagic;
> Here I think it would make sense to add a switch that optionally makes
> this setting independent of ctx.dev_id. But we would have to document
> really well that this is solely meant for specific types of debugging/unit
> testing.
>
>
>
>
>
>
>
>
>     Am Montag, 8. Januar 2018, 19:30:02 MEZ hat Chris Olivier <
> cjolivier01@gmail.com> Folgendes geschrieben:
>
>  Is it explicitly defined somewhere that random number generators should
> always return a deterministic set of numbers given the same seed, or is
> that just a side-effect of some hardware not having a better way to
> generate random numbers so they use a user-defined seed to kick off the
> randomization starting point?
>
> On Mon, Jan 8, 2018 at 9:27 AM, kellen sunderland <
> kellen.sunderland@gmail.com> wrote:
>
> > Hello MXNet devs,
> >
> > I wanted to see what people thought about the follow section of code,
> which
> > I think has some subtle pros/cons:
> > https://github.com/apache/incubator-mxnet/blob/
> > d2a856a3a2abb4e72edc301b8b821f0b75f30722/src/resource.cc#L188
> >
> > Tobi (tdomhan) from sockeye pointed it out to me after he spent some time
> > debugging non-determinism in his model training.
> >
> > This functionality is well documented here:
> > https://mxnet.incubator.apache.org/api/python/ndarray.
> > html#mxnet.random.seed
> > but I don't think the current api meets all use cases due to this
> section:
> >
> > "Random number generators in MXNet are device specific. Therefore, random
> > numbers generated from two devices can be different even if they are
> seeded
> > using the same seed."
> >
> > I'm guessing this is a feature that makes distributed training easier in
> > MXNet, you wouldn't want to train the same model on each GPU.  However
> the
> > downside of this is that if you run unit tests on a multi-gpu system, or
> in
> > a training environment where you don't have control over which GPU you
> use,
> > you can't count on deterministic behaviour which you can assert results
> > against.  I have a feeling there are non-unit test use cases where you'd
> > also want deterministic behaviour independent of which gpu you happen to
> > have your code scheduled to run on.
> >
> > How do others feel about this?  Would it make sense to have some optional
> > args in the seed call to have the seed-per-device functionality turned
> off?
> >
> > -Kellen
> >
>
>

Re: [DISCUSS] Seeding and determinism on multi-gpu systems.

Posted by Asmus Hetzel <as...@yahoo.de.INVALID>.

 The issue is tricky. Number generators should return deterministic sets of numbers as Chris said, but that usually only applies to non-distributed systems. And to some extend, we have already a distributed system as soon as one cpu and one gpu is involved.
For the usual setup like distributed training, using different seeds on different devices is a must. You distribute a process that involves random number generation and that means that you absolutely have to ensure that the sequences on the devices do not correlate. So this behaviour is intended and correct. We also can not guarantee that random number generation is deterministic when running on CPU vs. running on GPU.
So what we are dealing here is generating repeatable results, when the application/code section is running on a single GPU out of a bigger set of available GPUs, but we do not have control on which one. The crucial line in mxnet is this one (resource.cc):

const uint32_t seed = ctx.dev_id + i * kMaxNumGPUs + global_seed * kRandMagic; 
Here I think it would make sense to add a switch that optionally makes this setting independent of ctx.dev_id. But we would have to document  really well that this is solely meant for specific types of debugging/unit testing. 

    Am Montag, 8. Januar 2018, 19:30:02 MEZ hat Chris Olivier <cj...@gmail.com> Folgendes geschrieben:  

 Is it explicitly defined somewhere that random number generators should
always return a deterministic set of numbers given the same seed, or is
that just a side-effect of some hardware not having a better way to
generate random numbers so they use a user-defined seed to kick off the
randomization starting point?

On Mon, Jan 8, 2018 at 9:27 AM, kellen sunderland <
kellen.sunderland@gmail.com> wrote:

> Hello MXNet devs,
>
> I wanted to see what people thought about the follow section of code, which
> I think has some subtle pros/cons:
> https://github.com/apache/incubator-mxnet/blob/
> d2a856a3a2abb4e72edc301b8b821f0b75f30722/src/resource.cc#L188
>
> Tobi (tdomhan) from sockeye pointed it out to me after he spent some time
> debugging non-determinism in his model training.
>
> This functionality is well documented here:
> https://mxnet.incubator.apache.org/api/python/ndarray.
> html#mxnet.random.seed
> but I don't think the current api meets all use cases due to this section:
>
> "Random number generators in MXNet are device specific. Therefore, random
> numbers generated from two devices can be different even if they are seeded
> using the same seed."
>
> I'm guessing this is a feature that makes distributed training easier in
> MXNet, you wouldn't want to train the same model on each GPU.  However the
> downside of this is that if you run unit tests on a multi-gpu system, or in
> a training environment where you don't have control over which GPU you use,
> you can't count on deterministic behaviour which you can assert results
> against.  I have a feeling there are non-unit test use cases where you'd
> also want deterministic behaviour independent of which gpu you happen to
> have your code scheduled to run on.
>
> How do others feel about this?  Would it make sense to have some optional
> args in the seed call to have the seed-per-device functionality turned off?
>
> -Kellen
>

Re: [DISCUSS] Seeding and determinism on multi-gpu systems.

Posted by Chris Olivier <cj...@gmail.com>.

Is it explicitly defined somewhere that random number generators should
always return a deterministic set of numbers given the same seed, or is
that just a side-effect of some hardware not having a better way to
generate random numbers so they use a user-defined seed to kick off the
randomization starting point?

On Mon, Jan 8, 2018 at 9:27 AM, kellen sunderland <
kellen.sunderland@gmail.com> wrote:

> Hello MXNet devs,
>
> I wanted to see what people thought about the follow section of code, which
> I think has some subtle pros/cons:
> https://github.com/apache/incubator-mxnet/blob/
> d2a856a3a2abb4e72edc301b8b821f0b75f30722/src/resource.cc#L188
>
> Tobi (tdomhan) from sockeye pointed it out to me after he spent some time
> debugging non-determinism in his model training.
>
> This functionality is well documented here:
> https://mxnet.incubator.apache.org/api/python/ndarray.
> html#mxnet.random.seed
> but I don't think the current api meets all use cases due to this section:
>
> "Random number generators in MXNet are device specific. Therefore, random
> numbers generated from two devices can be different even if they are seeded
> using the same seed."
>
> I'm guessing this is a feature that makes distributed training easier in
> MXNet, you wouldn't want to train the same model on each GPU.  However the
> downside of this is that if you run unit tests on a multi-gpu system, or in
> a training environment where you don't have control over which GPU you use,
> you can't count on deterministic behaviour which you can assert results
> against.  I have a feeling there are non-unit test use cases where you'd
> also want deterministic behaviour independent of which gpu you happen to
> have your code scheduled to run on.
>
> How do others feel about this?  Would it make sense to have some optional
> args in the seed call to have the seed-per-device functionality turned off?
>
> -Kellen
>