You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@singa.apache.org by Richard Platania <rd...@gmail.com> on 2016/05/11 23:07:22 UTC

Question about data parallelism within group

It is my understanding that data parallelism within a group should split
the batch evenly among the workers in the group. However, I noticed that
each worker is loading the exact same records. For example, consider a
batch size of 10, two workers in a group, and partition dimension of 0
(batch dimension) on the network. I would expect the first and second
worker to be given records 0-4 and 5-9 respectively. Instead, this is
resulting in both workers loading a copy of records 0-4.

If this is intended, it would be great if someone could clear up why data
parallelism configuration is causing multiple workers in a group to have
the same records.

Regards,
Richard Platania

Re: Question about data parallelism within group

Posted by Wang Wei <wa...@comp.nus.edu.sg>.
Hi Richard,

Yes, you are correct.
Each worker loads 4 records independently from the same data path.
To make them load different records, we do a little trick by assigning
different starting offsets to the two workers, which is set as
```
store_conf {
  ...
  random_skip: 5000  // the starting offset is random number in [0, 5000)
}
```

regards,
Wei



On Thu, May 12, 2016 at 7:07 AM, Richard Platania <rd...@gmail.com>
wrote:

> It is my understanding that data parallelism within a group should split
> the batch evenly among the workers in the group. However, I noticed that
> each worker is loading the exact same records. For example, consider a
> batch size of 10, two workers in a group, and partition dimension of 0
> (batch dimension) on the network. I would expect the first and second
> worker to be given records 0-4 and 5-9 respectively. Instead, this is
> resulting in both workers loading a copy of records 0-4.
>
> If this is intended, it would be great if someone could clear up why data
> parallelism configuration is causing multiple workers in a group to have
> the same records.
>
> Regards,
> Richard Platania
>