You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@helix.apache.org by Vinoth Chandar <vi...@uber.com> on 2016/03/15 21:45:21 UTC

Balancing out skews in FULL_AUTO mode with built-in rebalancer

Hi guys,

We are hitting a fairly known issue where we have 100s of resource with < 8
resources spreading across 10 servers and the built-in assignment always
assigns partitions from first to last, resulting in heavy skew for a few
nodes.

Chatted with Kishore offline and made a patch as here
<https://gist.github.com/vinothchandar/e8837df301501f85e257>.Tested with 5
resources with 2 partitions each across 8 servers, logging out the
nodeShift & ultimate index picked does indicate that we choose servers
other than the first two, which is good

But
1) I am guessing it gets overriden by other logic in
computePartitionAssignment(..), the end assignment is still skewed.
2) Even with murmur hash, there is some skew on the nodeshift, which needs
to ironed out.

I will keep chipping at this.. Any feedback appreciated

Thanks
Vinoth

Re: Balancing out skews in FULL_AUTO mode with built-in rebalancer

Posted by Vinoth Chandar <vi...@uber.com>.
Okay thanks for the lead. Will try this and reporr back

On Friday, March 25, 2016, kishore g <g....@gmail.com> wrote:

> so computeOrphans is the one thats causing the behavior.
>
> In the beginning when nothing is assigned, all replicas are considered as
> orphans. Once they are considered as Orphan, they get assigned to any
> random node (this overrides everything thats computed by the placement
> scheme)
>
> I think the logic in computeOrphaned is broken, a replica should be
> treated as Orphan if the preferred node is not part of live node list.
>
> Try this in computeOrphaned. Note, the test case might fail because of
> this change and you will might have to change that according to new
> behavior. I think it will be good to introduce this behavior based on
> cluster config parameter.
>
>  private Set<Replica> computeOrphaned() {
>     Set<Replica> orphanedPartitions = new TreeSet<Replica>();
>     for(Entry<Replica, Node> entry:_preferredAssignment.entrySet()){
>       if(!_liveNodesList.contains(entry.getValue())){
>         orphanedPartitions.add(entry.getKey());
>       }
>     }
>     for (Replica r : _existingPreferredAssignment.keySet()) {
>       if (orphanedPartitions.contains(r)) {
>         orphanedPartitions.remove(r);
>       }
>     }
>     for (Replica r : _existingNonPreferredAssignment.keySet()) {
>       if (orphanedPartitions.contains(r)) {
>         orphanedPartitions.remove(r);
>       }
>     }
>
>     return orphanedPartitions;
>   }
>
> On Fri, Mar 25, 2016 at 8:41 AM, Vinoth Chandar <vinoth@uber.com
> <javascript:_e(%7B%7D,'cvml','vinoth@uber.com');>> wrote:
>
>> Here you go
>>
>> https://gist.github.com/vinothchandar/18feedfa84650e3efdc0
>>
>>
>> On Fri, Mar 25, 2016 at 8:32 AM, kishore g <g.kishore@gmail.com
>> <javascript:_e(%7B%7D,'cvml','g.kishore@gmail.com');>> wrote:
>>
>>> Can you point me to your code. fork/patch?
>>>
>>> On Fri, Mar 25, 2016 at 5:26 AM, Vinoth Chandar <vinoth@uber.com
>>> <javascript:_e(%7B%7D,'cvml','vinoth@uber.com');>> wrote:
>>>
>>>> Hi Kishore,
>>>>
>>>> Printed out more information and trimmed the test down to 1 resource
>>>> with 2 partitions, and I bring up 8 servers in parallel.
>>>>
>>>> Below is the paste of my logging output + annotations.
>>>>
>>>> >>> Computing partition assignment
>>>> >>>> NodeShift for countLog-2a 0 is 5, index 5
>>>> >>>> NodeShift for countLog-2a 1 is 5, index 6
>>>>
>>>> VC: So this part seems fine. We pick nodes at index 5 & 6 instead of 0,
>>>> 1
>>>>
>>>> >>>>  Preferred Assignment: {countLog-2a_0|0=##########
>>>> name=localhost-server-6
>>>> preferred:0
>>>> nonpreferred:0, countLog-2a_1|0=##########
>>>> name=localhost-server-7
>>>> preferred:0
>>>> nonpreferred:0}
>>>>
>>>> VC: This translates to server-6/server-7 (since I named them starting 1)
>>>>
>>>> >>>>  Existing Preferred Assignment: {}
>>>> >>>>  Existing Non Preferred Assignment: {}
>>>> >>>>  Orphaned: [countLog-2a_0|0, countLog-2a_1|0]
>>>> >>> Final State Map :{0=ONLINE}
>>>> >>>> Final ZK record : countLog-2a,
>>>> {}{countLog-2a_0={localhost-server-1=ONLINE},
>>>> countLog-2a_1={localhost-server-1=ONLINE}}{countLog-2a_0=[localhost-server-1],
>>>> countLog-2a_1=[localhost-server-1]}
>>>>
>>>> VC: But the final effect still seems to be assigning the partitions to
>>>> servers 1 & 2 (first two).
>>>>
>>>> Any ideas on where to start poking?
>>>>
>>>>
>>>> Thanks
>>>> Vinoth
>>>>
>>>> On Tue, Mar 15, 2016 at 5:52 PM, Vinoth Chandar <vinoth@uber.com
>>>> <javascript:_e(%7B%7D,'cvml','vinoth@uber.com');>> wrote:
>>>>
>>>>> Hi Kishore,
>>>>>
>>>>> I think the changes I made are exercised when computing the preferred
>>>>> assignment, later when the reconciliation happens with existing
>>>>> assignment/orphaned partitions etc, I think it does not take effect.
>>>>>
>>>>> The effective assignment I saw was all partitions (2 per resource)
>>>>> were assigned to first 2 servers. I started to dig into the above mentioned
>>>>> parts of the code, will report back tmrw when I pick this back up.
>>>>>
>>>>> Thanks,
>>>>> Vinoth
>>>>>
>>>>> _____________________________
>>>>> From: kishore g <g.kishore@gmail.com
>>>>> <javascript:_e(%7B%7D,'cvml','g.kishore@gmail.com');>>
>>>>> Sent: Tuesday, March 15, 2016 2:01 PM
>>>>> Subject: Re: Balancing out skews in FULL_AUTO mode with built-in
>>>>> rebalancer
>>>>> To: <user@helix.apache.org
>>>>> <javascript:_e(%7B%7D,'cvml','user@helix.apache.org');>>
>>>>>
>>>>>
>>>>>
>>>>> 1) I am guessing it gets overriden by other logic in
>>>>> computePartitionAssignment(..), the end assignment is still skewed.
>>>>>
>>>>> What is the logic you are referring to?
>>>>>
>>>>> Can you print the assignment count for your use case?
>>>>>
>>>>>
>>>>> thanks,
>>>>> Kishore G
>>>>>
>>>>> On Tue, Mar 15, 2016 at 1:45 PM, Vinoth Chandar <vinoth@uber.com
>>>>> <javascript:_e(%7B%7D,'cvml','vinoth@uber.com');>> wrote:
>>>>>
>>>>>> Hi guys,
>>>>>>
>>>>>> We are hitting a fairly known issue where we have 100s of resource
>>>>>> with < 8 resources spreading across 10 servers and the built-in assignment
>>>>>> always assigns partitions from first to last, resulting in heavy skew for a
>>>>>> few nodes.
>>>>>>
>>>>>> Chatted with Kishore offline and made a patch as here
>>>>>> <https://gist.github.com/vinothchandar/e8837df301501f85e257>.Tested
>>>>>> with 5 resources with 2 partitions each across 8 servers, logging out the
>>>>>> nodeShift & ultimate index picked does indicate that we choose servers
>>>>>> other than the first two, which is good
>>>>>>
>>>>>> But
>>>>>> 1) I am guessing it gets overriden by other logic in
>>>>>> computePartitionAssignment(..), the end assignment is still skewed.
>>>>>> 2) Even with murmur hash, there is some skew on the nodeshift, which
>>>>>> needs to ironed out.
>>>>>>
>>>>>> I will keep chipping at this.. Any feedback appreciated
>>>>>>
>>>>>> Thanks
>>>>>> Vinoth
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Balancing out skews in FULL_AUTO mode with built-in rebalancer

Posted by kishore g <g....@gmail.com>.
so computeOrphans is the one thats causing the behavior.

In the beginning when nothing is assigned, all replicas are considered as
orphans. Once they are considered as Orphan, they get assigned to any
random node (this overrides everything thats computed by the placement
scheme)

I think the logic in computeOrphaned is broken, a replica should be treated
as Orphan if the preferred node is not part of live node list.

Try this in computeOrphaned. Note, the test case might fail because of this
change and you will might have to change that according to new behavior. I
think it will be good to introduce this behavior based on cluster config
parameter.

 private Set<Replica> computeOrphaned() {
    Set<Replica> orphanedPartitions = new TreeSet<Replica>();
    for(Entry<Replica, Node> entry:_preferredAssignment.entrySet()){
      if(!_liveNodesList.contains(entry.getValue())){
        orphanedPartitions.add(entry.getKey());
      }
    }
    for (Replica r : _existingPreferredAssignment.keySet()) {
      if (orphanedPartitions.contains(r)) {
        orphanedPartitions.remove(r);
      }
    }
    for (Replica r : _existingNonPreferredAssignment.keySet()) {
      if (orphanedPartitions.contains(r)) {
        orphanedPartitions.remove(r);
      }
    }

    return orphanedPartitions;
  }

On Fri, Mar 25, 2016 at 8:41 AM, Vinoth Chandar <vi...@uber.com> wrote:

> Here you go
>
> https://gist.github.com/vinothchandar/18feedfa84650e3efdc0
>
>
> On Fri, Mar 25, 2016 at 8:32 AM, kishore g <g....@gmail.com> wrote:
>
>> Can you point me to your code. fork/patch?
>>
>> On Fri, Mar 25, 2016 at 5:26 AM, Vinoth Chandar <vi...@uber.com> wrote:
>>
>>> Hi Kishore,
>>>
>>> Printed out more information and trimmed the test down to 1 resource
>>> with 2 partitions, and I bring up 8 servers in parallel.
>>>
>>> Below is the paste of my logging output + annotations.
>>>
>>> >>> Computing partition assignment
>>> >>>> NodeShift for countLog-2a 0 is 5, index 5
>>> >>>> NodeShift for countLog-2a 1 is 5, index 6
>>>
>>> VC: So this part seems fine. We pick nodes at index 5 & 6 instead of 0, 1
>>>
>>> >>>>  Preferred Assignment: {countLog-2a_0|0=##########
>>> name=localhost-server-6
>>> preferred:0
>>> nonpreferred:0, countLog-2a_1|0=##########
>>> name=localhost-server-7
>>> preferred:0
>>> nonpreferred:0}
>>>
>>> VC: This translates to server-6/server-7 (since I named them starting 1)
>>>
>>> >>>>  Existing Preferred Assignment: {}
>>> >>>>  Existing Non Preferred Assignment: {}
>>> >>>>  Orphaned: [countLog-2a_0|0, countLog-2a_1|0]
>>> >>> Final State Map :{0=ONLINE}
>>> >>>> Final ZK record : countLog-2a,
>>> {}{countLog-2a_0={localhost-server-1=ONLINE},
>>> countLog-2a_1={localhost-server-1=ONLINE}}{countLog-2a_0=[localhost-server-1],
>>> countLog-2a_1=[localhost-server-1]}
>>>
>>> VC: But the final effect still seems to be assigning the partitions to
>>> servers 1 & 2 (first two).
>>>
>>> Any ideas on where to start poking?
>>>
>>>
>>> Thanks
>>> Vinoth
>>>
>>> On Tue, Mar 15, 2016 at 5:52 PM, Vinoth Chandar <vi...@uber.com> wrote:
>>>
>>>> Hi Kishore,
>>>>
>>>> I think the changes I made are exercised when computing the preferred
>>>> assignment, later when the reconciliation happens with existing
>>>> assignment/orphaned partitions etc, I think it does not take effect.
>>>>
>>>> The effective assignment I saw was all partitions (2 per resource) were
>>>> assigned to first 2 servers. I started to dig into the above mentioned
>>>> parts of the code, will report back tmrw when I pick this back up.
>>>>
>>>> Thanks,
>>>> Vinoth
>>>>
>>>> _____________________________
>>>> From: kishore g <g....@gmail.com>
>>>> Sent: Tuesday, March 15, 2016 2:01 PM
>>>> Subject: Re: Balancing out skews in FULL_AUTO mode with built-in
>>>> rebalancer
>>>> To: <us...@helix.apache.org>
>>>>
>>>>
>>>>
>>>> 1) I am guessing it gets overriden by other logic in
>>>> computePartitionAssignment(..), the end assignment is still skewed.
>>>>
>>>> What is the logic you are referring to?
>>>>
>>>> Can you print the assignment count for your use case?
>>>>
>>>>
>>>> thanks,
>>>> Kishore G
>>>>
>>>> On Tue, Mar 15, 2016 at 1:45 PM, Vinoth Chandar <vi...@uber.com>
>>>> wrote:
>>>>
>>>>> Hi guys,
>>>>>
>>>>> We are hitting a fairly known issue where we have 100s of resource
>>>>> with < 8 resources spreading across 10 servers and the built-in assignment
>>>>> always assigns partitions from first to last, resulting in heavy skew for a
>>>>> few nodes.
>>>>>
>>>>> Chatted with Kishore offline and made a patch as here
>>>>> <https://gist.github.com/vinothchandar/e8837df301501f85e257>.Tested
>>>>> with 5 resources with 2 partitions each across 8 servers, logging out the
>>>>> nodeShift & ultimate index picked does indicate that we choose servers
>>>>> other than the first two, which is good
>>>>>
>>>>> But
>>>>> 1) I am guessing it gets overriden by other logic in
>>>>> computePartitionAssignment(..), the end assignment is still skewed.
>>>>> 2) Even with murmur hash, there is some skew on the nodeshift, which
>>>>> needs to ironed out.
>>>>>
>>>>> I will keep chipping at this.. Any feedback appreciated
>>>>>
>>>>> Thanks
>>>>> Vinoth
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: Balancing out skews in FULL_AUTO mode with built-in rebalancer

Posted by Vinoth Chandar <vi...@uber.com>.
Here you go

https://gist.github.com/vinothchandar/18feedfa84650e3efdc0


On Fri, Mar 25, 2016 at 8:32 AM, kishore g <g....@gmail.com> wrote:

> Can you point me to your code. fork/patch?
>
> On Fri, Mar 25, 2016 at 5:26 AM, Vinoth Chandar <vi...@uber.com> wrote:
>
>> Hi Kishore,
>>
>> Printed out more information and trimmed the test down to 1 resource with
>> 2 partitions, and I bring up 8 servers in parallel.
>>
>> Below is the paste of my logging output + annotations.
>>
>> >>> Computing partition assignment
>> >>>> NodeShift for countLog-2a 0 is 5, index 5
>> >>>> NodeShift for countLog-2a 1 is 5, index 6
>>
>> VC: So this part seems fine. We pick nodes at index 5 & 6 instead of 0, 1
>>
>> >>>>  Preferred Assignment: {countLog-2a_0|0=##########
>> name=localhost-server-6
>> preferred:0
>> nonpreferred:0, countLog-2a_1|0=##########
>> name=localhost-server-7
>> preferred:0
>> nonpreferred:0}
>>
>> VC: This translates to server-6/server-7 (since I named them starting 1)
>>
>> >>>>  Existing Preferred Assignment: {}
>> >>>>  Existing Non Preferred Assignment: {}
>> >>>>  Orphaned: [countLog-2a_0|0, countLog-2a_1|0]
>> >>> Final State Map :{0=ONLINE}
>> >>>> Final ZK record : countLog-2a,
>> {}{countLog-2a_0={localhost-server-1=ONLINE},
>> countLog-2a_1={localhost-server-1=ONLINE}}{countLog-2a_0=[localhost-server-1],
>> countLog-2a_1=[localhost-server-1]}
>>
>> VC: But the final effect still seems to be assigning the partitions to
>> servers 1 & 2 (first two).
>>
>> Any ideas on where to start poking?
>>
>>
>> Thanks
>> Vinoth
>>
>> On Tue, Mar 15, 2016 at 5:52 PM, Vinoth Chandar <vi...@uber.com> wrote:
>>
>>> Hi Kishore,
>>>
>>> I think the changes I made are exercised when computing the preferred
>>> assignment, later when the reconciliation happens with existing
>>> assignment/orphaned partitions etc, I think it does not take effect.
>>>
>>> The effective assignment I saw was all partitions (2 per resource) were
>>> assigned to first 2 servers. I started to dig into the above mentioned
>>> parts of the code, will report back tmrw when I pick this back up.
>>>
>>> Thanks,
>>> Vinoth
>>>
>>> _____________________________
>>> From: kishore g <g....@gmail.com>
>>> Sent: Tuesday, March 15, 2016 2:01 PM
>>> Subject: Re: Balancing out skews in FULL_AUTO mode with built-in
>>> rebalancer
>>> To: <us...@helix.apache.org>
>>>
>>>
>>>
>>> 1) I am guessing it gets overriden by other logic in
>>> computePartitionAssignment(..), the end assignment is still skewed.
>>>
>>> What is the logic you are referring to?
>>>
>>> Can you print the assignment count for your use case?
>>>
>>>
>>> thanks,
>>> Kishore G
>>>
>>> On Tue, Mar 15, 2016 at 1:45 PM, Vinoth Chandar <vi...@uber.com>
>>> wrote:
>>>
>>>> Hi guys,
>>>>
>>>> We are hitting a fairly known issue where we have 100s of resource with
>>>> < 8 resources spreading across 10 servers and the built-in assignment
>>>> always assigns partitions from first to last, resulting in heavy skew for a
>>>> few nodes.
>>>>
>>>> Chatted with Kishore offline and made a patch as here
>>>> <https://gist.github.com/vinothchandar/e8837df301501f85e257>.Tested
>>>> with 5 resources with 2 partitions each across 8 servers, logging out the
>>>> nodeShift & ultimate index picked does indicate that we choose servers
>>>> other than the first two, which is good
>>>>
>>>> But
>>>> 1) I am guessing it gets overriden by other logic in
>>>> computePartitionAssignment(..), the end assignment is still skewed.
>>>> 2) Even with murmur hash, there is some skew on the nodeshift, which
>>>> needs to ironed out.
>>>>
>>>> I will keep chipping at this.. Any feedback appreciated
>>>>
>>>> Thanks
>>>> Vinoth
>>>>
>>>
>>>
>>>
>>>
>>
>

Re: Balancing out skews in FULL_AUTO mode with built-in rebalancer

Posted by kishore g <g....@gmail.com>.
Can you point me to your code. fork/patch?

On Fri, Mar 25, 2016 at 5:26 AM, Vinoth Chandar <vi...@uber.com> wrote:

> Hi Kishore,
>
> Printed out more information and trimmed the test down to 1 resource with
> 2 partitions, and I bring up 8 servers in parallel.
>
> Below is the paste of my logging output + annotations.
>
> >>> Computing partition assignment
> >>>> NodeShift for countLog-2a 0 is 5, index 5
> >>>> NodeShift for countLog-2a 1 is 5, index 6
>
> VC: So this part seems fine. We pick nodes at index 5 & 6 instead of 0, 1
>
> >>>>  Preferred Assignment: {countLog-2a_0|0=##########
> name=localhost-server-6
> preferred:0
> nonpreferred:0, countLog-2a_1|0=##########
> name=localhost-server-7
> preferred:0
> nonpreferred:0}
>
> VC: This translates to server-6/server-7 (since I named them starting 1)
>
> >>>>  Existing Preferred Assignment: {}
> >>>>  Existing Non Preferred Assignment: {}
> >>>>  Orphaned: [countLog-2a_0|0, countLog-2a_1|0]
> >>> Final State Map :{0=ONLINE}
> >>>> Final ZK record : countLog-2a,
> {}{countLog-2a_0={localhost-server-1=ONLINE},
> countLog-2a_1={localhost-server-1=ONLINE}}{countLog-2a_0=[localhost-server-1],
> countLog-2a_1=[localhost-server-1]}
>
> VC: But the final effect still seems to be assigning the partitions to
> servers 1 & 2 (first two).
>
> Any ideas on where to start poking?
>
>
> Thanks
> Vinoth
>
> On Tue, Mar 15, 2016 at 5:52 PM, Vinoth Chandar <vi...@uber.com> wrote:
>
>> Hi Kishore,
>>
>> I think the changes I made are exercised when computing the preferred
>> assignment, later when the reconciliation happens with existing
>> assignment/orphaned partitions etc, I think it does not take effect.
>>
>> The effective assignment I saw was all partitions (2 per resource) were
>> assigned to first 2 servers. I started to dig into the above mentioned
>> parts of the code, will report back tmrw when I pick this back up.
>>
>> Thanks,
>> Vinoth
>>
>> _____________________________
>> From: kishore g <g....@gmail.com>
>> Sent: Tuesday, March 15, 2016 2:01 PM
>> Subject: Re: Balancing out skews in FULL_AUTO mode with built-in
>> rebalancer
>> To: <us...@helix.apache.org>
>>
>>
>>
>> 1) I am guessing it gets overriden by other logic in
>> computePartitionAssignment(..), the end assignment is still skewed.
>>
>> What is the logic you are referring to?
>>
>> Can you print the assignment count for your use case?
>>
>>
>> thanks,
>> Kishore G
>>
>> On Tue, Mar 15, 2016 at 1:45 PM, Vinoth Chandar <vi...@uber.com> wrote:
>>
>>> Hi guys,
>>>
>>> We are hitting a fairly known issue where we have 100s of resource with
>>> < 8 resources spreading across 10 servers and the built-in assignment
>>> always assigns partitions from first to last, resulting in heavy skew for a
>>> few nodes.
>>>
>>> Chatted with Kishore offline and made a patch as here
>>> <https://gist.github.com/vinothchandar/e8837df301501f85e257>.Tested
>>> with 5 resources with 2 partitions each across 8 servers, logging out the
>>> nodeShift & ultimate index picked does indicate that we choose servers
>>> other than the first two, which is good
>>>
>>> But
>>> 1) I am guessing it gets overriden by other logic in
>>> computePartitionAssignment(..), the end assignment is still skewed.
>>> 2) Even with murmur hash, there is some skew on the nodeshift, which
>>> needs to ironed out.
>>>
>>> I will keep chipping at this.. Any feedback appreciated
>>>
>>> Thanks
>>> Vinoth
>>>
>>
>>
>>
>>
>

Re: Balancing out skews in FULL_AUTO mode with built-in rebalancer

Posted by Vinoth Chandar <vi...@uber.com>.
Hi Kishore,

Printed out more information and trimmed the test down to 1 resource with 2
partitions, and I bring up 8 servers in parallel.

Below is the paste of my logging output + annotations.

>>> Computing partition assignment
>>>> NodeShift for countLog-2a 0 is 5, index 5
>>>> NodeShift for countLog-2a 1 is 5, index 6

VC: So this part seems fine. We pick nodes at index 5 & 6 instead of 0, 1

>>>>  Preferred Assignment: {countLog-2a_0|0=##########
name=localhost-server-6
preferred:0
nonpreferred:0, countLog-2a_1|0=##########
name=localhost-server-7
preferred:0
nonpreferred:0}

VC: This translates to server-6/server-7 (since I named them starting 1)

>>>>  Existing Preferred Assignment: {}
>>>>  Existing Non Preferred Assignment: {}
>>>>  Orphaned: [countLog-2a_0|0, countLog-2a_1|0]
>>> Final State Map :{0=ONLINE}
>>>> Final ZK record : countLog-2a,
{}{countLog-2a_0={localhost-server-1=ONLINE},
countLog-2a_1={localhost-server-1=ONLINE}}{countLog-2a_0=[localhost-server-1],
countLog-2a_1=[localhost-server-1]}

VC: But the final effect still seems to be assigning the partitions to
servers 1 & 2 (first two).

Any ideas on where to start poking?


Thanks
Vinoth

On Tue, Mar 15, 2016 at 5:52 PM, Vinoth Chandar <vi...@uber.com> wrote:

> Hi Kishore,
>
> I think the changes I made are exercised when computing the preferred
> assignment, later when the reconciliation happens with existing
> assignment/orphaned partitions etc, I think it does not take effect.
>
> The effective assignment I saw was all partitions (2 per resource) were
> assigned to first 2 servers. I started to dig into the above mentioned
> parts of the code, will report back tmrw when I pick this back up.
>
> Thanks,
> Vinoth
>
> _____________________________
> From: kishore g <g....@gmail.com>
> Sent: Tuesday, March 15, 2016 2:01 PM
> Subject: Re: Balancing out skews in FULL_AUTO mode with built-in rebalancer
> To: <us...@helix.apache.org>
>
>
>
> 1) I am guessing it gets overriden by other logic in
> computePartitionAssignment(..), the end assignment is still skewed.
>
> What is the logic you are referring to?
>
> Can you print the assignment count for your use case?
>
>
> thanks,
> Kishore G
>
> On Tue, Mar 15, 2016 at 1:45 PM, Vinoth Chandar <vi...@uber.com> wrote:
>
>> Hi guys,
>>
>> We are hitting a fairly known issue where we have 100s of resource with <
>> 8 resources spreading across 10 servers and the built-in assignment always
>> assigns partitions from first to last, resulting in heavy skew for a few
>> nodes.
>>
>> Chatted with Kishore offline and made a patch as here
>> <https://gist.github.com/vinothchandar/e8837df301501f85e257>.Tested with
>> 5 resources with 2 partitions each across 8 servers, logging out the
>> nodeShift & ultimate index picked does indicate that we choose servers
>> other than the first two, which is good
>>
>> But
>> 1) I am guessing it gets overriden by other logic in
>> computePartitionAssignment(..), the end assignment is still skewed.
>> 2) Even with murmur hash, there is some skew on the nodeshift, which
>> needs to ironed out.
>>
>> I will keep chipping at this.. Any feedback appreciated
>>
>> Thanks
>> Vinoth
>>
>
>
>
>

Re: Balancing out skews in FULL_AUTO mode with built-in rebalancer

Posted by Vinoth Chandar <vi...@uber.com>.
Hi Kishore,
I think the changes I made are exercised when computing the preferred assignment, later when the reconciliation happens with existing assignment/orphaned partitions etc, I think it does not take effect.
The effective assignment I saw was all partitions (2 per resource) were assigned to first 2 servers. I started to dig into the above mentioned parts of the code, will report back tmrw when I pick this back up.
Thanks,
Vinoth

    _____________________________
From: kishore g <g....@gmail.com>
Sent: Tuesday, March 15, 2016 2:01 PM
Subject: Re: Balancing out skews in FULL_AUTO mode with built-in rebalancer
To:  <us...@helix.apache.org>


                1) I am guessing it gets overriden by other logic in computePartitionAssignment(..), the end assignment is still skewed.     
             
             What is the logic you are referring to?             
             Can you print the assignment count for your use case?             
             
                  thanks,          Kishore G          
       On Tue, Mar 15, 2016 at 1:45 PM, Vinoth Chandar     <vi...@uber.com> wrote:    
                                                                                     Hi guys,              
             
            We are hitting a fairly known issue where we have 100s of resource with < 8 resources spreading across 10 servers and the built-in assignment always assigns partitions from first to last, resulting in heavy skew for a few nodes.             
            
           Chatted with Kishore offline and made a patch as            here.Tested with 5 resources with 2 partitions each across 8 servers, logging out the nodeShift & ultimate index picked does indicate that we choose servers other than the first two, which is good           
           
But           
1) I am guessing it gets overriden by other logic in computePartitionAssignment(..), the end assignment is still skewed.           
                   2) Even with murmur hash, there is some skew on the nodeshift, which needs to ironed out.         
         
        I will keep chipping at this.. Any feedback appreciated        
        
       Thanks       
            Vinoth
                
    


  

Re: Balancing out skews in FULL_AUTO mode with built-in rebalancer

Posted by kishore g <g....@gmail.com>.
1) I am guessing it gets overriden by other logic in
computePartitionAssignment(..), the end assignment is still skewed.

What is the logic you are referring to?

Can you print the assignment count for your use case?


thanks,
Kishore G

On Tue, Mar 15, 2016 at 1:45 PM, Vinoth Chandar <vi...@uber.com> wrote:

> Hi guys,
>
> We are hitting a fairly known issue where we have 100s of resource with <
> 8 resources spreading across 10 servers and the built-in assignment always
> assigns partitions from first to last, resulting in heavy skew for a few
> nodes.
>
> Chatted with Kishore offline and made a patch as here
> <https://gist.github.com/vinothchandar/e8837df301501f85e257>.Tested with
> 5 resources with 2 partitions each across 8 servers, logging out the
> nodeShift & ultimate index picked does indicate that we choose servers
> other than the first two, which is good
>
> But
> 1) I am guessing it gets overriden by other logic in
> computePartitionAssignment(..), the end assignment is still skewed.
> 2) Even with murmur hash, there is some skew on the nodeshift, which needs
> to ironed out.
>
> I will keep chipping at this.. Any feedback appreciated
>
> Thanks
> Vinoth
>