You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@singa.apache.org by Richard Platania <rd...@gmail.com> on 2016/05/11 23:07:22 UTC
Question about data parallelism within group
It is my understanding that data parallelism within a group should split
the batch evenly among the workers in the group. However, I noticed that
each worker is loading the exact same records. For example, consider a
batch size of 10, two workers in a group, and partition dimension of 0
(batch dimension) on the network. I would expect the first and second
worker to be given records 0-4 and 5-9 respectively. Instead, this is
resulting in both workers loading a copy of records 0-4.
If this is intended, it would be great if someone could clear up why data
parallelism configuration is causing multiple workers in a group to have
the same records.
Regards,
Richard Platania
Re: Question about data parallelism within group
Posted by Wang Wei <wa...@comp.nus.edu.sg>.
Hi Richard,
Yes, you are correct.
Each worker loads 4 records independently from the same data path.
To make them load different records, we do a little trick by assigning
different starting offsets to the two workers, which is set as
```
store_conf {
...
random_skip: 5000 // the starting offset is random number in [0, 5000)
}
```
regards,
Wei
On Thu, May 12, 2016 at 7:07 AM, Richard Platania <rd...@gmail.com>
wrote:
> It is my understanding that data parallelism within a group should split
> the batch evenly among the workers in the group. However, I noticed that
> each worker is loading the exact same records. For example, consider a
> batch size of 10, two workers in a group, and partition dimension of 0
> (batch dimension) on the network. I would expect the first and second
> worker to be given records 0-4 and 5-9 respectively. Instead, this is
> resulting in both workers loading a copy of records 0-4.
>
> If this is intended, it would be great if someone could clear up why data
> parallelism configuration is causing multiple workers in a group to have
> the same records.
>
> Regards,
> Richard Platania
>