You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@helix.apache.org by "vlad.gm@gmail.com" <vl...@gmail.com> on 2014/04/04 01:57:59 UTC

keeping the master node up during bootstrap

Dear all,

I am trying to construct a state model with the following transition
diagram:

OFFLINE -> BOOTSTRAPPING <---> SLAVE <-----> MASTER
         <-----------------------------------

That is, an offline mode can go into a bootstraping state, from the
bootstrap state it can go into a slave state,
from slave it can go from master, from master to slave and from slave it
can go offline.

Assume that if I have a partition with two nodes pf1 and pf2 and a
partition partition_0 with the following ideal state:

partition_0: pf2: MASTER pf1: SLAVE,

and that currently pf1 is serving as a master. When pf2 boots, Helix will
issue, almost simultaneously, two commands:
for pf1: transition from MASTER to SLAVE
for pf2: transition from BOOTSTRAPPING to SLAVE

My understanding is that this happens since Helix is trying to execute as
many commands in parallel and since the last state
has pf2 as master. However, the transition from BOOTSTRAPPING to SLAVE for
pf2 involves a long data copy step, so
I would like to keep pf1 as a master in the meanwhile. I tried prioritizing
the transition from BOOTSTRAPPING to SLAVE
over the transition from MASTER to SLAVE, however Helix still issues them
in parallel (as it should).

I was wondering what my options would be in order to keep the master up
while the future master is bootstrapping. Could
a throttling in the number of transitions be enforced at partition level?
Could I somehow specify that a state with a slave
and a bootstrapping node is undesirable?

As a note, I have also looked at the RSync-replicateed filesystem example.
The reason for not using the OfflineOnline or the
MasterSlave model in my application is that I would like the bootstrapping
node to receive updates from clients, i.e. be visible
during the bootstrap. For this reason, I am introducing the new
BOOTSTRAPPING phase in-between OFFLINE and SLAVE.

Regards,
Vlad


PS: The state model definition is as follows:

builder.addState(MASTER, 1);



            builder.addState(SLAVE, 2);



            builder.addState(BOOTSTRAP, 3);



            builder.addState(OFFLINE);



            builder.addState(DROPPED);



            // Set the initial state when the node starts



            builder.initialState(OFFLINE);







            // Add transitions between the states.



            builder.addTransition(OFFLINE, BOOTSTRAP, 4);



            builder.addTransition(BOOTSTRAP, SLAVE, 5);



            builder.addTransition(SLAVE, MASTER, 6);



            builder.addTransition(MASTER, SLAVE, 3);



            builder.addTransition(SLAVE, OFFLINE, 2);



            builder.addTransition(OFFLINE, DROPPED, 1);







            // set constraints on states.



            // static constraint



            builder.upperBound(MASTER, 1);



            // dynamic constraint, R means it should be derived based on
the replication


            // factor.



            builder.dynamicUpperBound(SLAVE, "R");

Re: keeping the master node up during bootstrap

Posted by "vlad.gm@gmail.com" <vl...@gmail.com>.

Dear all,

The patch worked perfectly. See below the results:

2014-04-10 11:27:46,592 (Thread-2) TaskAssignmentStage INFO: Sending
Message fcaaf416-bd62-43e8-98a3-e9e20201b58e to
pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE
to:BOOTSTRAP

2014-04-10 11:27:48,045 (Thread-2) TaskAssignmentStage INFO: Sending
Message cfa8985b-1521-42e8-8f38-f58f6852b2ff to
pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP to:SLAVE

2014-04-10 11:27:49,717 (Thread-2) TaskAssignmentStage INFO: Sending
Message 76f358eb-97bc-4457-b260-add038643d65 to
pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE to:MASTER

2014-04-10 11:29:35,272 (Thread-2) TaskAssignmentStage INFO: Sending
Message 5c38654b-10fe-4a08-9fe4-41d8b712dd99 to
pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE
to:BOOTSTRAP

2014-04-10 11:29:44,702 (Thread-2) TaskAssignmentStage INFO: Sending
Message 28e41ec2-daeb-4675-b1ac-5e8b094c0012 to
pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP to:SLAVE

2014-04-10 11:29:46,442 (Thread-2) TaskAssignmentStage INFO: Sending
Message 3107ceba-380d-4d0f-bad1-964fe166868b to
pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:MASTER to:SLAVE

2014-04-10 11:29:47,021 (Thread-2) TaskAssignmentStage INFO: Sending
Message 0fa6e490-b65e-4818-a027-5490c6ebbfd2 to
pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE to:MASTER

Thank you very much again for the quick solution!

Regards,
Vlad


On Wed, Apr 9, 2014 at 9:42 PM, Vlad Balan <vl...@gmail.com> wrote:

> Thank you for the very quick fix! I will test it and let you know of the
> result!
>
> Regards,
> Vlad
>
> On Apr 9, 2014, at 9:16 PM, kishore g <g....@gmail.com> wrote:
>
> Hi Vlad,
>
> Here is the diff https://reviews.apache.org/r/20196/diff for the fix and
> the test case. If you want to give it a try. Apply this on the master.
>
> thanks,
> Kishore G
>
>
> On Wed, Apr 9, 2014 at 1:40 PM, Kanak Biscuitwala <ka...@hotmail.com>wrote:
>
>>
>> Based on the result of the conversation, we found the following:
>>
>> 1. 0.6.x doesn't support partition constraints. Created
>> https://issues.apache.org/jira/browse/HELIX-426
>> 2. 0.7.x doesn't honor partition constraints correctly. Created
>> https://issues.apache.org/jira/browse/HELIX-425
>>
>> We will try to fix these tomorrow.
>>
>> Kanak
>> ________________________________
>> > Date: Wed, 9 Apr 2014 12:51:10 -0700
>> > Subject: Re: keeping the master node up during bootstrap
>> > From: vlad.gm@gmail.com
>> > To: user@helix.apache.org
>> >
>> > Sure! I'll join the channel!
>> >
>> >
>> > On Wed, Apr 9, 2014 at 12:41 PM, kishore g
>> > <g....@gmail.com>> wrote:
>> > Hi Vlad,
>> >
>> > I have some questions. Can you join the IRC channel #apachehelix.
>> >
>> > thanks,
>> > Kishore G
>> >
>> >
>> > On Wed, Apr 9, 2014 at 11:35 AM,
>> > vlad.gm@gmail.com<ma...@gmail.com>
>> > <vl...@gmail.com>> wrote:
>> > Upon some further testing, it seems that the controller does not
>> > execute the events in the right sequence.
>> >
>> > Here are the results of some of my testing. Assume that we have a
>> > partition NEWPROFILE_5 with the ideal state:
>> >
>> > "NEWPROFILE_5" : {
>> >
>> > "pf1.apps-pf.dev.docker_12000" : "SLAVE",
>> >
>> > "pf2.apps-pf.dev.docker_12000" : "MASTER"
>> >
>> > }
>> >
>> > I boot the host pf1 and a few minutes later the host pf2. In the
>> > controller logs I see, when doing a grep for NEWPROFILE_5:
>> >
>> > 2014-04-08 17:04:35,309 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message 69b4eddf-ac5f-4726-9d6b-bac742ad082e to
>> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE
>> > to:MASTER
>> >
>> > 2014-04-08 17:27:08,187 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message a221b1ac-0807-425e-9062-6507e45b0bfb to
>> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE
>> > to:BOOTSTRAP
>> >
>> > 2014-04-08 17:27:10,164 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message 73ed85fd-49c9-46a5-b262-687d612c7d06 to
>> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP
>> > to:SLAVE
>> >
>> > 2014-04-08 17:27:11,868 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message fb21aecc-68cf-4b9f-9718-aa6ed535c29d to
>> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE
>> > to:MASTER
>> >
>> > 2014-04-08 17:28:22,978 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message ea441d18-b1f3-4ceb-96a2-3262cab1dfbe to
>> > pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE
>> > to:BOOTSTRAP
>> >
>> > 2014-04-08 17:28:22,978 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message f36b4d64-c790-413b-b9fa-915b9539d28c to
>> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:MASTER
>> > to:SLAVE
>> >
>> > 2014-04-08 17:28:26,065 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message 201429e1-e810-4017-b3ef-fb5930ac2192 to
>> > pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP
>> > to:SLAVE
>> >
>> > 2014-04-08 17:28:28,238 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message 4a1fb64c-1063-4e49-a995-946d2dd25733 to
>> > pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE
>> > to:MASTER
>> >
>> > That is, the controller issues an offline->bootstrap command to pf-2,
>> > but then issues a master->slave command to of-1 before bringing pf-2 up
>> > as a slave as well (the last step before promotion to master). Since
>> > the bootstrap->slave that follows takes time, the system spends time
>> > without a master for the partition.
>> >
>> > The state model definition was:
>> > public static StateModelDefinition defineStateModel() {
>> > StateModelDefinition.Builder builder =
>> > new StateModelDefinition.Builder(KVHelixDefinitions.STATE_MODEL_NAME);
>> > // Add states and their rank to indicate priority. Lower the rank
>> higher the
>> > // priority
>> > builder.addState(MASTER, 1);
>> > builder.addState(SLAVE, 2);
>> > builder.addState(BOOTSTRAP, 3);
>> > builder.addState(OFFLINE);
>> > builder.addState(DROPPED);
>> > // Set the initial state when the node starts
>> > builder.initialState(OFFLINE);
>> >
>> > // Add transitions between the states.
>> > builder.addTransition(OFFLINE, BOOTSTRAP, 3);
>> > builder.addTransition(BOOTSTRAP, SLAVE, 2);
>> > builder.addTransition(SLAVE, MASTER, 1);
>> > builder.addTransition(MASTER, SLAVE, 4);
>> > builder.addTransition(SLAVE, OFFLINE, 5);
>> > builder.addTransition(OFFLINE, DROPPED, 6);
>> >
>> > // set constraints on states.
>> > // static constraint
>> > builder.upperBound(MASTER, 1);
>> > // dynamic constraint, R means it should be derived based on the
>> replication
>> > // factor.
>> > builder.dynamicUpperBound(SLAVE, "R");
>> >
>> > StateModelDefinition statemodelDefinition = builder.build();
>> >
>> > assert(statemodelDefinition.isValid());
>> >
>> > return statemodelDefinition;
>> > }
>> >
>> > I have tried reversing the values of the transition priorities. In this
>> > case, the controller log file looked as follows:
>> >
>> > 2014-04-09 11:17:52,831 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message 2b29a319-c1c6-4042-b1ad-3e3c1b5092a7 to
>> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE
>> > to:BOOTSTRAP
>> >
>> > 2014-04-09 11:17:55,672 (Thread-2) MessageGenerationStage INFO: Message
>> > hasn't been removed for pf1.apps-pf.dev.docker_12000 to
>> > transitNEWPROFILE_5 to BOOTSTRAP, desiredState: MASTER
>> >
>> > 2014-04-09 11:17:57,047 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message b1ca701d-65f1-46b9-9ae4-286400d6d266 to
>> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP
>> > to:SLAVE
>> >
>> > 2014-04-09 11:17:58,888 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message fe10228f-8f5b-4133-964a-5f6c7e60b0e6 to
>> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE
>> > to:MASTER
>> >
>> > 2014-04-09 11:23:26,117 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message 6252a4e6-0ab8-490a-a51d-c47195c434b5 to
>> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:MASTER
>> > to:SLAVE
>> >
>> > 2014-04-09 11:23:26,117 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message 18bbf028-cb51-4162-8226-a6564a121986 to
>> > pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE
>> > to:BOOTSTRAP
>> >
>> > 2014-04-09 11:23:33,462 (Thread-2) MessageGenerationStage INFO: Message
>> > hasn't been removed for pf2.apps-pf.dev.docker_12000 to
>> > transitNEWPROFILE_5 to BOOTSTRAP, desiredState: MASTER
>> >
>> > 2014-04-09 11:23:33,892 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message c7fc4983-9d71-4dc4-bfee-2ad69e4de411 to
>> > pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP
>> > to:SLAVE
>> >
>> > 2014-04-09 11:23:35,933 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message 75e715ed-3d53-4e39-b1e7-44695e4bfa03 to
>> > pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE
>> > to:MASTER
>> >
>> > That is, the transition for master->slave for pf1 was executed before
>> > taking any action on pf2, clearly the opposite of the right order.
>> >
>> >
>> > On Tue, Apr 8, 2014 at 2:19 PM, Kanak Biscuitwala
>> > <ka...@hotmail.com>> wrote:
>> >
>> > Looks good, thanks for sharing!
>> >
>> > Kanak
>> > ________________________________
>> >> Date: Tue, 8 Apr 2014 14:08:28 -0700
>> >> Subject: Re: keeping the master node up during bootstrap
>> >> From: vlad.gm@gmail.com<ma...@gmail.com>
>> >> To: user@helix.apache.org<ma...@helix.apache.org>
>> >>
>> >> My modified code looks like:
>> >>
>> >> /* Setup a Helix cluster for the KVStore */
>> >> public static void setupCluster() {
>> >> assert(cluster != null);
>> >> clusterSetup.addCluster(cluster, true);
>> >>
>> >> а а а а ConstraintItemBuilder constraintItemBuilder = new
>> >> ConstraintItemBuilder();
>> >>
>> >> а а а а constraintItemBuilder
>> >> а а а а а а а а
>> >> .addConstraintAttribute(ConstraintAttribute.MESSAGE_TYPE.toString(),
>> >> "STATE_TRANSITION")
>> >> а а а а а а а а
>> >> .addConstraintAttribute(ConstraintAttribute.PARTITION.toString(), ".*")
>> >> а а а а а а а а
>> >>
>> .addConstraintAttribute(ConstraintAttribute.CONSTRAINT_VALUE.toString(),
>> >> "1");
>> >>
>> >> а а а а clusterSetup.getClusterManagementTool().setConstraint(cluster,
>> >> а а а а а а а а ClusterConstraints.ConstraintType.MESSAGE_CONSTRAINT,
>> >> а а а а а а а а "constraint1", constraintItemBuilder.build());
>> >> а а }
>> >>
>> >> I will try to see whether it works in every situation.
>> >>
>> >> Regards,
>> >> Vlad
>> >>
>> >>
>> >> On Tue, Apr 8, 2014 at 8:59 AM, Vlad Balan
>> >>
>> > <vl...@gmail.com><mailto:vlad.gm@gmail.com
>> <ma...@gmail.com>>>
>> > wrote:
>> >> Hi Kishore,
>> >>
>> >> I managed to implement the bootstrapping using the constraint and it
>> >> appears to be running as expected. I will post my code shortly.
>> >>
>> >> Regards,
>> >> Vlad
>> >>
>> >> On Apr 8, 2014, at 8:27 AM, kishore g
>> >>
>> > <g....@gmail.com><mailto:
>> g.kishore@gmail.com<ma...@gmail.com>>>
>> > wrote:
>> >>
>> >> Hi Vlad,
>> >>
>> >> Did you get a chance to play with the constraint.а I can write a sample
>> >> code today to try this.
>> >>
>> >> Thanks,
>> >> Kishore G
>> >>
>> >>
>> >> On Thu, Apr 3, 2014 at 5:45 PM,
>> >>
>> > vlad.gm@gmail.com<ma...@gmail.com><mailto:vlad.gm@gmail.com
>> <ma...@gmail.com>>
>> >>
>> > <vl...@gmail.com><mailto:vlad.gm@gmail.com
>> <ma...@gmail.com>>>
>> > wrote:
>> >>
>> >> Thank you Kanak and Kishore! I will try enforcing the per-partition
>> >> constraint and let you know if somehow it does not work. I was looking
>> >> at the throttling documentation, but somehow missed that a
>> >> per-partition constraint was an option!
>> >>
>> >> Regards,
>> >> Vlad
>> >>
>> >>
>> >> On Thu, Apr 3, 2014 at 5:42 PM, kishore g
>> >>
>> > <g....@gmail.com><mailto:
>> g.kishore@gmail.com<ma...@gmail.com>>>
>> > wrote:
>> >> Hi Vlad,
>> >>
>> >> You can try setting the transition priority order and a constraint that
>> >> there should be only one transition per partition across the cluster.
>> >>
>> >> So the transition priority could be something like
>> >>
>> >> Slave-Master
>> >> Offfline -> Bootstrap
>> >> Bootstrap->Slave
>> >> Slave->Master
>> >>
>> >> For the rest not sure if order matters.
>> >>
>> >> Also set the max transitions constraint to 1 per partition.
>> >>
>> >> The reason I put Slave-Master before Offline->Bootstrap is to ensure
>> >> that availability is given more importance. For example if you have 3
>> >> nodes, N1, N2, N3. N1 is Master, N2 is Slave, and N3 is down. If N1
>> >> goes down and N3 comes up at the same time. We probably dont want to
>> >> wait for N3 to bootstrap before promoting N2 to Master.
>> >>
>> >> I haven't tested this but assuming the constraints enforcement works,
>> >> this should do the trick.
>> >>
>> >> Does this make sense? Let me know if this does not work, we can add a
>> >> test case.
>> >>
>> >> thanks,
>> >> Kishore G
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Thu, Apr 3, 2014 at 4:57 PM,
>> >>
>> > vlad.gm@gmail.com<ma...@gmail.com><mailto:vlad.gm@gmail.com
>> <ma...@gmail.com>>
>> >>
>> > <vl...@gmail.com><mailto:vlad.gm@gmail.com
>> <ma...@gmail.com>>>
>> > wrote:
>> >>
>> >> Dear all,
>> >>
>> >> I am trying to construct a state model with the following transition
>> > diagram:
>> >>
>> >> OFFLINE -> BOOTSTRAPPING <---> SLAVE <-----> MASTER
>> >> а а а а а<-----------------------------------
>> >>
>> >> That is, an offline mode can go into a bootstraping state, from the
>> >> bootstrap state it can go into a slave state,
>> >> from slave it can go from master, from master to slave and from slave
>> >> it can go offline.
>> >>
>> >> Assume that if I have a partition with two nodes pf1 and pf2 and a
>> >> partition partition_0 with the following ideal state:
>> >>
>> >> partition_0: pf2: MASTER pf1: SLAVE,
>> >>
>> >> and that currently pf1 is serving as a master. When pf2 boots, Helix
>> >> will issue, almost simultaneously, two commands:
>> >> for pf1: transition from MASTER to SLAVE
>> >> for pf2: transition from BOOTSTRAPPING to SLAVE
>> >>
>> >> My understanding is that this happens since Helix is trying to execute
>> >> as many commands in parallel and since the last state
>> >> has pf2 as master. However, the transition from BOOTSTRAPPING to SLAVE
>> >> for pf2 involves a long data copy step, so
>> >> I would like to keep pf1 as a master in the meanwhile. I tried
>> >> prioritizing the transition from BOOTSTRAPPING to SLAVE
>> >> over the transition from MASTER to SLAVE, however Helix still issues
>> >> them in parallel (as it should).
>> >>
>> >> I was wondering what my options would be in order to keep the master up
>> >> while the future master is bootstrapping. Could
>> >> a throttling in the number of transitions be enforced at partition
>> >> level? Could I somehow specify that a state with a slave
>> >> and a bootstrapping node is undesirable?
>> >>
>> >> As a note, I have also looked at the RSync-replicateed filesystem
>> >> example. The reason for not using the OfflineOnline or the
>> >> MasterSlave model in my application is that I would like the
>> >> bootstrapping node to receive updates from clients, i.e. be visible
>> >> during the bootstrap. For this reason, I am introducing the new
>> >> BOOTSTRAPPING phase in-between OFFLINE and SLAVE.
>> >>
>> >> Regards,
>> >> Vlad
>> >>
>> >>
>> >> PS: The state model definition is as follows:
>> >>
>> >> builder.addState(MASTER, 1); а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а
>> >>
>> >> а а а а а а builder.addState(SLAVE, 2);а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а builder.addState(BOOTSTRAP, 3);а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а builder.addState(OFFLINE); а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а builder.addState(DROPPED); а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а // Set the initial state when the node startsа а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а builder.initialState(OFFLINE); а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> аа а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а // Add transitions between the states. а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а builder.addTransition(OFFLINE, BOOTSTRAP, 4);а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а builder.addTransition(BOOTSTRAP, SLAVE, 5);а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а builder.addTransition(SLAVE, MASTER, 6); а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а builder.addTransition(MASTER, SLAVE, 3); а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а builder.addTransition(SLAVE, OFFLINE, 2);а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а builder.addTransition(OFFLINE, DROPPED, 1);а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> аа а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а // set constraints on states.а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а // static constraint а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а builder.upperBound(MASTER, 1); а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а // dynamic constraint, R means it should be derived based
>> >> on the replication а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а // factor. а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а builder.dynamicUpperBound(SLAVE, "R");а а а а а а а а а а а
>> >>
>> >>
>> >>
>> >>
>> >
>> >
>> >
>> >
>>
>>
>
>

Re: keeping the master node up during bootstrap

Posted by Vlad Balan <vl...@gmail.com>.

Thank you for the very quick fix! I will test it and let you know of the result!

Regards,
Vlad

> On Apr 9, 2014, at 9:16 PM, kishore g <g....@gmail.com> wrote:
> 
> Hi Vlad,
> 
> Here is the diff https://reviews.apache.org/r/20196/diff for the fix and the test case. If you want to give it a try. Apply this on the master.
> 
> thanks,
> Kishore G
> 
> 
>> On Wed, Apr 9, 2014 at 1:40 PM, Kanak Biscuitwala <ka...@hotmail.com> wrote:
>> 
>> Based on the result of the conversation, we found the following:
>> 
>> 1. 0.6.x doesn't support partition constraints. Created https://issues.apache.org/jira/browse/HELIX-426
>> 2. 0.7.x doesn't honor partition constraints correctly. Created https://issues.apache.org/jira/browse/HELIX-425
>> 
>> We will try to fix these tomorrow.
>> 
>> Kanak
>> ________________________________
>> > Date: Wed, 9 Apr 2014 12:51:10 -0700
>> > Subject: Re: keeping the master node up during bootstrap
>> > From: vlad.gm@gmail.com
>> > To: user@helix.apache.org
>> >
>> > Sure! I'll join the channel!
>> >
>> >
>> > On Wed, Apr 9, 2014 at 12:41 PM, kishore g
>> > <g....@gmail.com>> wrote:
>> > Hi Vlad,
>> >
>> > I have some questions. Can you join the IRC channel #apachehelix.
>> >
>> > thanks,
>> > Kishore G
>> >
>> >
>> > On Wed, Apr 9, 2014 at 11:35 AM,
>> > vlad.gm@gmail.com<ma...@gmail.com>
>> > <vl...@gmail.com>> wrote:
>> > Upon some further testing, it seems that the controller does not
>> > execute the events in the right sequence.
>> >
>> > Here are the results of some of my testing. Assume that we have a
>> > partition NEWPROFILE_5 with the ideal state:
>> >
>> > "NEWPROFILE_5" : {
>> >
>> > "pf1.apps-pf.dev.docker_12000" : "SLAVE",
>> >
>> > "pf2.apps-pf.dev.docker_12000" : "MASTER"
>> >
>> > }
>> >
>> > I boot the host pf1 and a few minutes later the host pf2. In the
>> > controller logs I see, when doing a grep for NEWPROFILE_5:
>> >
>> > 2014-04-08 17:04:35,309 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message 69b4eddf-ac5f-4726-9d6b-bac742ad082e to
>> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE
>> > to:MASTER
>> >
>> > 2014-04-08 17:27:08,187 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message a221b1ac-0807-425e-9062-6507e45b0bfb to
>> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE
>> > to:BOOTSTRAP
>> >
>> > 2014-04-08 17:27:10,164 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message 73ed85fd-49c9-46a5-b262-687d612c7d06 to
>> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP
>> > to:SLAVE
>> >
>> > 2014-04-08 17:27:11,868 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message fb21aecc-68cf-4b9f-9718-aa6ed535c29d to
>> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE
>> > to:MASTER
>> >
>> > 2014-04-08 17:28:22,978 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message ea441d18-b1f3-4ceb-96a2-3262cab1dfbe to
>> > pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE
>> > to:BOOTSTRAP
>> >
>> > 2014-04-08 17:28:22,978 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message f36b4d64-c790-413b-b9fa-915b9539d28c to
>> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:MASTER
>> > to:SLAVE
>> >
>> > 2014-04-08 17:28:26,065 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message 201429e1-e810-4017-b3ef-fb5930ac2192 to
>> > pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP
>> > to:SLAVE
>> >
>> > 2014-04-08 17:28:28,238 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message 4a1fb64c-1063-4e49-a995-946d2dd25733 to
>> > pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE
>> > to:MASTER
>> >
>> > That is, the controller issues an offline->bootstrap command to pf-2,
>> > but then issues a master->slave command to of-1 before bringing pf-2 up
>> > as a slave as well (the last step before promotion to master). Since
>> > the bootstrap->slave that follows takes time, the system spends time
>> > without a master for the partition.
>> >
>> > The state model definition was:
>> > public static StateModelDefinition defineStateModel() {
>> > StateModelDefinition.Builder builder =
>> > new StateModelDefinition.Builder(KVHelixDefinitions.STATE_MODEL_NAME);
>> > // Add states and their rank to indicate priority. Lower the rank higher the
>> > // priority
>> > builder.addState(MASTER, 1);
>> > builder.addState(SLAVE, 2);
>> > builder.addState(BOOTSTRAP, 3);
>> > builder.addState(OFFLINE);
>> > builder.addState(DROPPED);
>> > // Set the initial state when the node starts
>> > builder.initialState(OFFLINE);
>> >
>> > // Add transitions between the states.
>> > builder.addTransition(OFFLINE, BOOTSTRAP, 3);
>> > builder.addTransition(BOOTSTRAP, SLAVE, 2);
>> > builder.addTransition(SLAVE, MASTER, 1);
>> > builder.addTransition(MASTER, SLAVE, 4);
>> > builder.addTransition(SLAVE, OFFLINE, 5);
>> > builder.addTransition(OFFLINE, DROPPED, 6);
>> >
>> > // set constraints on states.
>> > // static constraint
>> > builder.upperBound(MASTER, 1);
>> > // dynamic constraint, R means it should be derived based on the replication
>> > // factor.
>> > builder.dynamicUpperBound(SLAVE, "R");
>> >
>> > StateModelDefinition statemodelDefinition = builder.build();
>> >
>> > assert(statemodelDefinition.isValid());
>> >
>> > return statemodelDefinition;
>> > }
>> >
>> > I have tried reversing the values of the transition priorities. In this
>> > case, the controller log file looked as follows:
>> >
>> > 2014-04-09 11:17:52,831 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message 2b29a319-c1c6-4042-b1ad-3e3c1b5092a7 to
>> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE
>> > to:BOOTSTRAP
>> >
>> > 2014-04-09 11:17:55,672 (Thread-2) MessageGenerationStage INFO: Message
>> > hasn't been removed for pf1.apps-pf.dev.docker_12000 to
>> > transitNEWPROFILE_5 to BOOTSTRAP, desiredState: MASTER
>> >
>> > 2014-04-09 11:17:57,047 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message b1ca701d-65f1-46b9-9ae4-286400d6d266 to
>> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP
>> > to:SLAVE
>> >
>> > 2014-04-09 11:17:58,888 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message fe10228f-8f5b-4133-964a-5f6c7e60b0e6 to
>> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE
>> > to:MASTER
>> >
>> > 2014-04-09 11:23:26,117 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message 6252a4e6-0ab8-490a-a51d-c47195c434b5 to
>> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:MASTER
>> > to:SLAVE
>> >
>> > 2014-04-09 11:23:26,117 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message 18bbf028-cb51-4162-8226-a6564a121986 to
>> > pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE
>> > to:BOOTSTRAP
>> >
>> > 2014-04-09 11:23:33,462 (Thread-2) MessageGenerationStage INFO: Message
>> > hasn't been removed for pf2.apps-pf.dev.docker_12000 to
>> > transitNEWPROFILE_5 to BOOTSTRAP, desiredState: MASTER
>> >
>> > 2014-04-09 11:23:33,892 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message c7fc4983-9d71-4dc4-bfee-2ad69e4de411 to
>> > pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP
>> > to:SLAVE
>> >
>> > 2014-04-09 11:23:35,933 (Thread-2) TaskAssignmentStage INFO: Sending
>> > Message 75e715ed-3d53-4e39-b1e7-44695e4bfa03 to
>> > pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE
>> > to:MASTER
>> >
>> > That is, the transition for master->slave for pf1 was executed before
>> > taking any action on pf2, clearly the opposite of the right order.
>> >
>> >
>> > On Tue, Apr 8, 2014 at 2:19 PM, Kanak Biscuitwala
>> > <ka...@hotmail.com>> wrote:
>> >
>> > Looks good, thanks for sharing!
>> >
>> > Kanak
>> > ________________________________
>> >> Date: Tue, 8 Apr 2014 14:08:28 -0700
>> >> Subject: Re: keeping the master node up during bootstrap
>> >> From: vlad.gm@gmail.com<ma...@gmail.com>
>> >> To: user@helix.apache.org<ma...@helix.apache.org>
>> >>
>> >> My modified code looks like:
>> >>
>> >> /* Setup a Helix cluster for the KVStore */
>> >> public static void setupCluster() {
>> >> assert(cluster != null);
>> >> clusterSetup.addCluster(cluster, true);
>> >>
>> >> а а а а ConstraintItemBuilder constraintItemBuilder = new
>> >> ConstraintItemBuilder();
>> >>
>> >> а а а а constraintItemBuilder
>> >> а а а а а а а а
>> >> .addConstraintAttribute(ConstraintAttribute.MESSAGE_TYPE.toString(),
>> >> "STATE_TRANSITION")
>> >> а а а а а а а а
>> >> .addConstraintAttribute(ConstraintAttribute.PARTITION.toString(), ".*")
>> >> а а а а а а а а
>> >> .addConstraintAttribute(ConstraintAttribute.CONSTRAINT_VALUE.toString(),
>> >> "1");
>> >>
>> >> а а а а clusterSetup.getClusterManagementTool().setConstraint(cluster,
>> >> а а а а а а а а ClusterConstraints.ConstraintType.MESSAGE_CONSTRAINT,
>> >> а а а а а а а а "constraint1", constraintItemBuilder.build());
>> >> а а }
>> >>
>> >> I will try to see whether it works in every situation.
>> >>
>> >> Regards,
>> >> Vlad
>> >>
>> >>
>> >> On Tue, Apr 8, 2014 at 8:59 AM, Vlad Balan
>> >>
>> > <vl...@gmail.com>>>
>> > wrote:
>> >> Hi Kishore,
>> >>
>> >> I managed to implement the bootstrapping using the constraint and it
>> >> appears to be running as expected. I will post my code shortly.
>> >>
>> >> Regards,
>> >> Vlad
>> >>
>> >> On Apr 8, 2014, at 8:27 AM, kishore g
>> >>
>> > <g....@gmail.com>>>
>> > wrote:
>> >>
>> >> Hi Vlad,
>> >>
>> >> Did you get a chance to play with the constraint.а I can write a sample
>> >> code today to try this.
>> >>
>> >> Thanks,
>> >> Kishore G
>> >>
>> >>
>> >> On Thu, Apr 3, 2014 at 5:45 PM,
>> >>
>> > vlad.gm@gmail.com<ma...@gmail.com>>
>> >>
>> > <vl...@gmail.com>>>
>> > wrote:
>> >>
>> >> Thank you Kanak and Kishore! I will try enforcing the per-partition
>> >> constraint and let you know if somehow it does not work. I was looking
>> >> at the throttling documentation, but somehow missed that a
>> >> per-partition constraint was an option!
>> >>
>> >> Regards,
>> >> Vlad
>> >>
>> >>
>> >> On Thu, Apr 3, 2014 at 5:42 PM, kishore g
>> >>
>> > <g....@gmail.com>>>
>> > wrote:
>> >> Hi Vlad,
>> >>
>> >> You can try setting the transition priority order and a constraint that
>> >> there should be only one transition per partition across the cluster.
>> >>
>> >> So the transition priority could be something like
>> >>
>> >> Slave-Master
>> >> Offfline -> Bootstrap
>> >> Bootstrap->Slave
>> >> Slave->Master
>> >>
>> >> For the rest not sure if order matters.
>> >>
>> >> Also set the max transitions constraint to 1 per partition.
>> >>
>> >> The reason I put Slave-Master before Offline->Bootstrap is to ensure
>> >> that availability is given more importance. For example if you have 3
>> >> nodes, N1, N2, N3. N1 is Master, N2 is Slave, and N3 is down. If N1
>> >> goes down and N3 comes up at the same time. We probably dont want to
>> >> wait for N3 to bootstrap before promoting N2 to Master.
>> >>
>> >> I haven't tested this but assuming the constraints enforcement works,
>> >> this should do the trick.
>> >>
>> >> Does this make sense? Let me know if this does not work, we can add a
>> >> test case.
>> >>
>> >> thanks,
>> >> Kishore G
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Thu, Apr 3, 2014 at 4:57 PM,
>> >>
>> > vlad.gm@gmail.com<ma...@gmail.com>>
>> >>
>> > <vl...@gmail.com>>>
>> > wrote:
>> >>
>> >> Dear all,
>> >>
>> >> I am trying to construct a state model with the following transition
>> > diagram:
>> >>
>> >> OFFLINE -> BOOTSTRAPPING <---> SLAVE <-----> MASTER
>> >> а а а а а<-----------------------------------
>> >>
>> >> That is, an offline mode can go into a bootstraping state, from the
>> >> bootstrap state it can go into a slave state,
>> >> from slave it can go from master, from master to slave and from slave
>> >> it can go offline.
>> >>
>> >> Assume that if I have a partition with two nodes pf1 and pf2 and a
>> >> partition partition_0 with the following ideal state:
>> >>
>> >> partition_0: pf2: MASTER pf1: SLAVE,
>> >>
>> >> and that currently pf1 is serving as a master. When pf2 boots, Helix
>> >> will issue, almost simultaneously, two commands:
>> >> for pf1: transition from MASTER to SLAVE
>> >> for pf2: transition from BOOTSTRAPPING to SLAVE
>> >>
>> >> My understanding is that this happens since Helix is trying to execute
>> >> as many commands in parallel and since the last state
>> >> has pf2 as master. However, the transition from BOOTSTRAPPING to SLAVE
>> >> for pf2 involves a long data copy step, so
>> >> I would like to keep pf1 as a master in the meanwhile. I tried
>> >> prioritizing the transition from BOOTSTRAPPING to SLAVE
>> >> over the transition from MASTER to SLAVE, however Helix still issues
>> >> them in parallel (as it should).
>> >>
>> >> I was wondering what my options would be in order to keep the master up
>> >> while the future master is bootstrapping. Could
>> >> a throttling in the number of transitions be enforced at partition
>> >> level? Could I somehow specify that a state with a slave
>> >> and a bootstrapping node is undesirable?
>> >>
>> >> As a note, I have also looked at the RSync-replicateed filesystem
>> >> example. The reason for not using the OfflineOnline or the
>> >> MasterSlave model in my application is that I would like the
>> >> bootstrapping node to receive updates from clients, i.e. be visible
>> >> during the bootstrap. For this reason, I am introducing the new
>> >> BOOTSTRAPPING phase in-between OFFLINE and SLAVE.
>> >>
>> >> Regards,
>> >> Vlad
>> >>
>> >>
>> >> PS: The state model definition is as follows:
>> >>
>> >> builder.addState(MASTER, 1); а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а
>> >>
>> >> а а а а а а builder.addState(SLAVE, 2);а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а builder.addState(BOOTSTRAP, 3);а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а builder.addState(OFFLINE); а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а builder.addState(DROPPED); а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а // Set the initial state when the node startsа а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а builder.initialState(OFFLINE); а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> аа а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а // Add transitions between the states. а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а builder.addTransition(OFFLINE, BOOTSTRAP, 4);а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а builder.addTransition(BOOTSTRAP, SLAVE, 5);а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а builder.addTransition(SLAVE, MASTER, 6); а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а builder.addTransition(MASTER, SLAVE, 3); а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а builder.addTransition(SLAVE, OFFLINE, 2);а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а builder.addTransition(OFFLINE, DROPPED, 1);а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> аа а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а // set constraints on states.а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а // static constraint а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а builder.upperBound(MASTER, 1); а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а // dynamic constraint, R means it should be derived based
>> >> on the replication а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а // factor. а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> >> а а а а а а а а а а а а а а а а а а
>> >>
>> >> а а а а а а builder.dynamicUpperBound(SLAVE, "R");а а а а а а а а а а а
>> >>
>> >>
>> >>
>> >>
>> >
>> >
>> >
>> >
>

Re: keeping the master node up during bootstrap

Posted by kishore g <g....@gmail.com>.

Hi Vlad,

Here is the diff https://reviews.apache.org/r/20196/diff for the fix and
the test case. If you want to give it a try. Apply this on the master.

thanks,
Kishore G


On Wed, Apr 9, 2014 at 1:40 PM, Kanak Biscuitwala <ka...@hotmail.com>wrote:

>
> Based on the result of the conversation, we found the following:
>
> 1. 0.6.x doesn't support partition constraints. Created
> https://issues.apache.org/jira/browse/HELIX-426
> 2. 0.7.x doesn't honor partition constraints correctly. Created
> https://issues.apache.org/jira/browse/HELIX-425
>
> We will try to fix these tomorrow.
>
> Kanak
> ________________________________
> > Date: Wed, 9 Apr 2014 12:51:10 -0700
> > Subject: Re: keeping the master node up during bootstrap
> > From: vlad.gm@gmail.com
> > To: user@helix.apache.org
> >
> > Sure! I'll join the channel!
> >
> >
> > On Wed, Apr 9, 2014 at 12:41 PM, kishore g
> > <g....@gmail.com>> wrote:
> > Hi Vlad,
> >
> > I have some questions. Can you join the IRC channel #apachehelix.
> >
> > thanks,
> > Kishore G
> >
> >
> > On Wed, Apr 9, 2014 at 11:35 AM,
> > vlad.gm@gmail.com<ma...@gmail.com>
> > <vl...@gmail.com>> wrote:
> > Upon some further testing, it seems that the controller does not
> > execute the events in the right sequence.
> >
> > Here are the results of some of my testing. Assume that we have a
> > partition NEWPROFILE_5 with the ideal state:
> >
> > "NEWPROFILE_5" : {
> >
> > "pf1.apps-pf.dev.docker_12000" : "SLAVE",
> >
> > "pf2.apps-pf.dev.docker_12000" : "MASTER"
> >
> > }
> >
> > I boot the host pf1 and a few minutes later the host pf2. In the
> > controller logs I see, when doing a grep for NEWPROFILE_5:
> >
> > 2014-04-08 17:04:35,309 (Thread-2) TaskAssignmentStage INFO: Sending
> > Message 69b4eddf-ac5f-4726-9d6b-bac742ad082e to
> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE
> > to:MASTER
> >
> > 2014-04-08 17:27:08,187 (Thread-2) TaskAssignmentStage INFO: Sending
> > Message a221b1ac-0807-425e-9062-6507e45b0bfb to
> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE
> > to:BOOTSTRAP
> >
> > 2014-04-08 17:27:10,164 (Thread-2) TaskAssignmentStage INFO: Sending
> > Message 73ed85fd-49c9-46a5-b262-687d612c7d06 to
> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP
> > to:SLAVE
> >
> > 2014-04-08 17:27:11,868 (Thread-2) TaskAssignmentStage INFO: Sending
> > Message fb21aecc-68cf-4b9f-9718-aa6ed535c29d to
> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE
> > to:MASTER
> >
> > 2014-04-08 17:28:22,978 (Thread-2) TaskAssignmentStage INFO: Sending
> > Message ea441d18-b1f3-4ceb-96a2-3262cab1dfbe to
> > pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE
> > to:BOOTSTRAP
> >
> > 2014-04-08 17:28:22,978 (Thread-2) TaskAssignmentStage INFO: Sending
> > Message f36b4d64-c790-413b-b9fa-915b9539d28c to
> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:MASTER
> > to:SLAVE
> >
> > 2014-04-08 17:28:26,065 (Thread-2) TaskAssignmentStage INFO: Sending
> > Message 201429e1-e810-4017-b3ef-fb5930ac2192 to
> > pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP
> > to:SLAVE
> >
> > 2014-04-08 17:28:28,238 (Thread-2) TaskAssignmentStage INFO: Sending
> > Message 4a1fb64c-1063-4e49-a995-946d2dd25733 to
> > pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE
> > to:MASTER
> >
> > That is, the controller issues an offline->bootstrap command to pf-2,
> > but then issues a master->slave command to of-1 before bringing pf-2 up
> > as a slave as well (the last step before promotion to master). Since
> > the bootstrap->slave that follows takes time, the system spends time
> > without a master for the partition.
> >
> > The state model definition was:
> > public static StateModelDefinition defineStateModel() {
> > StateModelDefinition.Builder builder =
> > new StateModelDefinition.Builder(KVHelixDefinitions.STATE_MODEL_NAME);
> > // Add states and their rank to indicate priority. Lower the rank higher
> the
> > // priority
> > builder.addState(MASTER, 1);
> > builder.addState(SLAVE, 2);
> > builder.addState(BOOTSTRAP, 3);
> > builder.addState(OFFLINE);
> > builder.addState(DROPPED);
> > // Set the initial state when the node starts
> > builder.initialState(OFFLINE);
> >
> > // Add transitions between the states.
> > builder.addTransition(OFFLINE, BOOTSTRAP, 3);
> > builder.addTransition(BOOTSTRAP, SLAVE, 2);
> > builder.addTransition(SLAVE, MASTER, 1);
> > builder.addTransition(MASTER, SLAVE, 4);
> > builder.addTransition(SLAVE, OFFLINE, 5);
> > builder.addTransition(OFFLINE, DROPPED, 6);
> >
> > // set constraints on states.
> > // static constraint
> > builder.upperBound(MASTER, 1);
> > // dynamic constraint, R means it should be derived based on the
> replication
> > // factor.
> > builder.dynamicUpperBound(SLAVE, "R");
> >
> > StateModelDefinition statemodelDefinition = builder.build();
> >
> > assert(statemodelDefinition.isValid());
> >
> > return statemodelDefinition;
> > }
> >
> > I have tried reversing the values of the transition priorities. In this
> > case, the controller log file looked as follows:
> >
> > 2014-04-09 11:17:52,831 (Thread-2) TaskAssignmentStage INFO: Sending
> > Message 2b29a319-c1c6-4042-b1ad-3e3c1b5092a7 to
> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE
> > to:BOOTSTRAP
> >
> > 2014-04-09 11:17:55,672 (Thread-2) MessageGenerationStage INFO: Message
> > hasn't been removed for pf1.apps-pf.dev.docker_12000 to
> > transitNEWPROFILE_5 to BOOTSTRAP, desiredState: MASTER
> >
> > 2014-04-09 11:17:57,047 (Thread-2) TaskAssignmentStage INFO: Sending
> > Message b1ca701d-65f1-46b9-9ae4-286400d6d266 to
> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP
> > to:SLAVE
> >
> > 2014-04-09 11:17:58,888 (Thread-2) TaskAssignmentStage INFO: Sending
> > Message fe10228f-8f5b-4133-964a-5f6c7e60b0e6 to
> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE
> > to:MASTER
> >
> > 2014-04-09 11:23:26,117 (Thread-2) TaskAssignmentStage INFO: Sending
> > Message 6252a4e6-0ab8-490a-a51d-c47195c434b5 to
> > pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:MASTER
> > to:SLAVE
> >
> > 2014-04-09 11:23:26,117 (Thread-2) TaskAssignmentStage INFO: Sending
> > Message 18bbf028-cb51-4162-8226-a6564a121986 to
> > pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE
> > to:BOOTSTRAP
> >
> > 2014-04-09 11:23:33,462 (Thread-2) MessageGenerationStage INFO: Message
> > hasn't been removed for pf2.apps-pf.dev.docker_12000 to
> > transitNEWPROFILE_5 to BOOTSTRAP, desiredState: MASTER
> >
> > 2014-04-09 11:23:33,892 (Thread-2) TaskAssignmentStage INFO: Sending
> > Message c7fc4983-9d71-4dc4-bfee-2ad69e4de411 to
> > pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP
> > to:SLAVE
> >
> > 2014-04-09 11:23:35,933 (Thread-2) TaskAssignmentStage INFO: Sending
> > Message 75e715ed-3d53-4e39-b1e7-44695e4bfa03 to
> > pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE
> > to:MASTER
> >
> > That is, the transition for master->slave for pf1 was executed before
> > taking any action on pf2, clearly the opposite of the right order.
> >
> >
> > On Tue, Apr 8, 2014 at 2:19 PM, Kanak Biscuitwala
> > <ka...@hotmail.com>> wrote:
> >
> > Looks good, thanks for sharing!
> >
> > Kanak
> > ________________________________
> >> Date: Tue, 8 Apr 2014 14:08:28 -0700
> >> Subject: Re: keeping the master node up during bootstrap
> >> From: vlad.gm@gmail.com<ma...@gmail.com>
> >> To: user@helix.apache.org<ma...@helix.apache.org>
> >>
> >> My modified code looks like:
> >>
> >> /* Setup a Helix cluster for the KVStore */
> >> public static void setupCluster() {
> >> assert(cluster != null);
> >> clusterSetup.addCluster(cluster, true);
> >>
> >> а а а а ConstraintItemBuilder constraintItemBuilder = new
> >> ConstraintItemBuilder();
> >>
> >> а а а а constraintItemBuilder
> >> а а а а а а а а
> >> .addConstraintAttribute(ConstraintAttribute.MESSAGE_TYPE.toString(),
> >> "STATE_TRANSITION")
> >> а а а а а а а а
> >> .addConstraintAttribute(ConstraintAttribute.PARTITION.toString(), ".*")
> >> а а а а а а а а
> >> .addConstraintAttribute(ConstraintAttribute.CONSTRAINT_VALUE.toString(),
> >> "1");
> >>
> >> а а а а clusterSetup.getClusterManagementTool().setConstraint(cluster,
> >> а а а а а а а а ClusterConstraints.ConstraintType.MESSAGE_CONSTRAINT,
> >> а а а а а а а а "constraint1", constraintItemBuilder.build());
> >> а а }
> >>
> >> I will try to see whether it works in every situation.
> >>
> >> Regards,
> >> Vlad
> >>
> >>
> >> On Tue, Apr 8, 2014 at 8:59 AM, Vlad Balan
> >>
> > <vl...@gmail.com><mailto:vlad.gm@gmail.com
> <ma...@gmail.com>>>
> > wrote:
> >> Hi Kishore,
> >>
> >> I managed to implement the bootstrapping using the constraint and it
> >> appears to be running as expected. I will post my code shortly.
> >>
> >> Regards,
> >> Vlad
> >>
> >> On Apr 8, 2014, at 8:27 AM, kishore g
> >>
> > <g....@gmail.com><mailto:
> g.kishore@gmail.com<ma...@gmail.com>>>
> > wrote:
> >>
> >> Hi Vlad,
> >>
> >> Did you get a chance to play with the constraint.а I can write a sample
> >> code today to try this.
> >>
> >> Thanks,
> >> Kishore G
> >>
> >>
> >> On Thu, Apr 3, 2014 at 5:45 PM,
> >>
> > vlad.gm@gmail.com<ma...@gmail.com><mailto:vlad.gm@gmail.com
> <ma...@gmail.com>>
> >>
> > <vl...@gmail.com><mailto:vlad.gm@gmail.com
> <ma...@gmail.com>>>
> > wrote:
> >>
> >> Thank you Kanak and Kishore! I will try enforcing the per-partition
> >> constraint and let you know if somehow it does not work. I was looking
> >> at the throttling documentation, but somehow missed that a
> >> per-partition constraint was an option!
> >>
> >> Regards,
> >> Vlad
> >>
> >>
> >> On Thu, Apr 3, 2014 at 5:42 PM, kishore g
> >>
> > <g....@gmail.com><mailto:
> g.kishore@gmail.com<ma...@gmail.com>>>
> > wrote:
> >> Hi Vlad,
> >>
> >> You can try setting the transition priority order and a constraint that
> >> there should be only one transition per partition across the cluster.
> >>
> >> So the transition priority could be something like
> >>
> >> Slave-Master
> >> Offfline -> Bootstrap
> >> Bootstrap->Slave
> >> Slave->Master
> >>
> >> For the rest not sure if order matters.
> >>
> >> Also set the max transitions constraint to 1 per partition.
> >>
> >> The reason I put Slave-Master before Offline->Bootstrap is to ensure
> >> that availability is given more importance. For example if you have 3
> >> nodes, N1, N2, N3. N1 is Master, N2 is Slave, and N3 is down. If N1
> >> goes down and N3 comes up at the same time. We probably dont want to
> >> wait for N3 to bootstrap before promoting N2 to Master.
> >>
> >> I haven't tested this but assuming the constraints enforcement works,
> >> this should do the trick.
> >>
> >> Does this make sense? Let me know if this does not work, we can add a
> >> test case.
> >>
> >> thanks,
> >> Kishore G
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Thu, Apr 3, 2014 at 4:57 PM,
> >>
> > vlad.gm@gmail.com<ma...@gmail.com><mailto:vlad.gm@gmail.com
> <ma...@gmail.com>>
> >>
> > <vl...@gmail.com><mailto:vlad.gm@gmail.com
> <ma...@gmail.com>>>
> > wrote:
> >>
> >> Dear all,
> >>
> >> I am trying to construct a state model with the following transition
> > diagram:
> >>
> >> OFFLINE -> BOOTSTRAPPING <---> SLAVE <-----> MASTER
> >> а а а а а<-----------------------------------
> >>
> >> That is, an offline mode can go into a bootstraping state, from the
> >> bootstrap state it can go into a slave state,
> >> from slave it can go from master, from master to slave and from slave
> >> it can go offline.
> >>
> >> Assume that if I have a partition with two nodes pf1 and pf2 and a
> >> partition partition_0 with the following ideal state:
> >>
> >> partition_0: pf2: MASTER pf1: SLAVE,
> >>
> >> and that currently pf1 is serving as a master. When pf2 boots, Helix
> >> will issue, almost simultaneously, two commands:
> >> for pf1: transition from MASTER to SLAVE
> >> for pf2: transition from BOOTSTRAPPING to SLAVE
> >>
> >> My understanding is that this happens since Helix is trying to execute
> >> as many commands in parallel and since the last state
> >> has pf2 as master. However, the transition from BOOTSTRAPPING to SLAVE
> >> for pf2 involves a long data copy step, so
> >> I would like to keep pf1 as a master in the meanwhile. I tried
> >> prioritizing the transition from BOOTSTRAPPING to SLAVE
> >> over the transition from MASTER to SLAVE, however Helix still issues
> >> them in parallel (as it should).
> >>
> >> I was wondering what my options would be in order to keep the master up
> >> while the future master is bootstrapping. Could
> >> a throttling in the number of transitions be enforced at partition
> >> level? Could I somehow specify that a state with a slave
> >> and a bootstrapping node is undesirable?
> >>
> >> As a note, I have also looked at the RSync-replicateed filesystem
> >> example. The reason for not using the OfflineOnline or the
> >> MasterSlave model in my application is that I would like the
> >> bootstrapping node to receive updates from clients, i.e. be visible
> >> during the bootstrap. For this reason, I am introducing the new
> >> BOOTSTRAPPING phase in-between OFFLINE and SLAVE.
> >>
> >> Regards,
> >> Vlad
> >>
> >>
> >> PS: The state model definition is as follows:
> >>
> >> builder.addState(MASTER, 1); а а а а а а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а
> >>
> >> а а а а а а builder.addState(SLAVE, 2);а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а
> >>
> >> а а а а а а builder.addState(BOOTSTRAP, 3);а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а
> >>
> >> а а а а а а builder.addState(OFFLINE); а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а
> >>
> >> а а а а а а builder.addState(DROPPED); а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а
> >>
> >> а а а а а а // Set the initial state when the node startsа а а а а а а
> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а
> >>
> >> а а а а а а builder.initialState(OFFLINE); а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а
> >>
> >> аа а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а
> >>
> >> а а а а а а // Add transitions between the states. а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а
> >>
> >> а а а а а а builder.addTransition(OFFLINE, BOOTSTRAP, 4);а а а а а а а
> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а
> >>
> >> а а а а а а builder.addTransition(BOOTSTRAP, SLAVE, 5);а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а
> >>
> >> а а а а а а builder.addTransition(SLAVE, MASTER, 6); а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а
> >>
> >> а а а а а а builder.addTransition(MASTER, SLAVE, 3); а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а
> >>
> >> а а а а а а builder.addTransition(SLAVE, OFFLINE, 2);а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а
> >>
> >> а а а а а а builder.addTransition(OFFLINE, DROPPED, 1);а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а
> >>
> >> аа а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а
> >>
> >> а а а а а а // set constraints on states.а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а
> >>
> >> а а а а а а // static constraint а а а а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а
> >>
> >> а а а а а а builder.upperBound(MASTER, 1); а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а
> >>
> >> а а а а а а // dynamic constraint, R means it should be derived based
> >> on the replication а а а а а а а а а а а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а а
> >>
> >> а а а а а а // factor. а а а а а а а а а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> >> а а а а а а а а а а а а а а а а а а
> >>
> >> а а а а а а builder.dynamicUpperBound(SLAVE, "R");а а а а а а а а а а а
> >>
> >>
> >>
> >>
> >
> >
> >
> >
>
>

RE: keeping the master node up during bootstrap

Posted by Kanak Biscuitwala <ka...@hotmail.com>.

Based on the result of the conversation, we found the following:

1. 0.6.x doesn't support partition constraints. Created https://issues.apache.org/jira/browse/HELIX-426
2. 0.7.x doesn't honor partition constraints correctly. Created https://issues.apache.org/jira/browse/HELIX-425

We will try to fix these tomorrow.

Kanak
________________________________
> Date: Wed, 9 Apr 2014 12:51:10 -0700 
> Subject: Re: keeping the master node up during bootstrap 
> From: vlad.gm@gmail.com 
> To: user@helix.apache.org 
> 
> Sure! I'll join the channel! 
> 
> 
> On Wed, Apr 9, 2014 at 12:41 PM, kishore g 
> <g....@gmail.com>> wrote: 
> Hi Vlad, 
> 
> I have some questions. Can you join the IRC channel #apachehelix. 
> 
> thanks, 
> Kishore G 
> 
> 
> On Wed, Apr 9, 2014 at 11:35 AM, 
> vlad.gm@gmail.com<ma...@gmail.com> 
> <vl...@gmail.com>> wrote: 
> Upon some further testing, it seems that the controller does not 
> execute the events in the right sequence. 
> 
> Here are the results of some of my testing. Assume that we have a 
> partition NEWPROFILE_5 with the ideal state: 
> 
> "NEWPROFILE_5" : { 
> 
> "pf1.apps-pf.dev.docker_12000" : "SLAVE", 
> 
> "pf2.apps-pf.dev.docker_12000" : "MASTER" 
> 
> } 
> 
> I boot the host pf1 and a few minutes later the host pf2. In the 
> controller logs I see, when doing a grep for NEWPROFILE_5: 
> 
> 2014-04-08 17:04:35,309 (Thread-2) TaskAssignmentStage INFO: Sending 
> Message 69b4eddf-ac5f-4726-9d6b-bac742ad082e to 
> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE 
> to:MASTER 
> 
> 2014-04-08 17:27:08,187 (Thread-2) TaskAssignmentStage INFO: Sending 
> Message a221b1ac-0807-425e-9062-6507e45b0bfb to 
> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE 
> to:BOOTSTRAP 
> 
> 2014-04-08 17:27:10,164 (Thread-2) TaskAssignmentStage INFO: Sending 
> Message 73ed85fd-49c9-46a5-b262-687d612c7d06 to 
> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP 
> to:SLAVE 
> 
> 2014-04-08 17:27:11,868 (Thread-2) TaskAssignmentStage INFO: Sending 
> Message fb21aecc-68cf-4b9f-9718-aa6ed535c29d to 
> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE 
> to:MASTER 
> 
> 2014-04-08 17:28:22,978 (Thread-2) TaskAssignmentStage INFO: Sending 
> Message ea441d18-b1f3-4ceb-96a2-3262cab1dfbe to 
> pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE 
> to:BOOTSTRAP 
> 
> 2014-04-08 17:28:22,978 (Thread-2) TaskAssignmentStage INFO: Sending 
> Message f36b4d64-c790-413b-b9fa-915b9539d28c to 
> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:MASTER 
> to:SLAVE 
> 
> 2014-04-08 17:28:26,065 (Thread-2) TaskAssignmentStage INFO: Sending 
> Message 201429e1-e810-4017-b3ef-fb5930ac2192 to 
> pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP 
> to:SLAVE 
> 
> 2014-04-08 17:28:28,238 (Thread-2) TaskAssignmentStage INFO: Sending 
> Message 4a1fb64c-1063-4e49-a995-946d2dd25733 to 
> pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE 
> to:MASTER 
> 
> That is, the controller issues an offline->bootstrap command to pf-2, 
> but then issues a master->slave command to of-1 before bringing pf-2 up 
> as a slave as well (the last step before promotion to master). Since 
> the bootstrap->slave that follows takes time, the system spends time 
> without a master for the partition. 
> 
> The state model definition was: 
> public static StateModelDefinition defineStateModel() { 
> StateModelDefinition.Builder builder = 
> new StateModelDefinition.Builder(KVHelixDefinitions.STATE_MODEL_NAME); 
> // Add states and their rank to indicate priority. Lower the rank higher the 
> // priority 
> builder.addState(MASTER, 1); 
> builder.addState(SLAVE, 2); 
> builder.addState(BOOTSTRAP, 3); 
> builder.addState(OFFLINE); 
> builder.addState(DROPPED); 
> // Set the initial state when the node starts 
> builder.initialState(OFFLINE); 
> 
> // Add transitions between the states. 
> builder.addTransition(OFFLINE, BOOTSTRAP, 3); 
> builder.addTransition(BOOTSTRAP, SLAVE, 2); 
> builder.addTransition(SLAVE, MASTER, 1); 
> builder.addTransition(MASTER, SLAVE, 4); 
> builder.addTransition(SLAVE, OFFLINE, 5); 
> builder.addTransition(OFFLINE, DROPPED, 6); 
> 
> // set constraints on states. 
> // static constraint 
> builder.upperBound(MASTER, 1); 
> // dynamic constraint, R means it should be derived based on the replication 
> // factor. 
> builder.dynamicUpperBound(SLAVE, "R"); 
> 
> StateModelDefinition statemodelDefinition = builder.build(); 
> 
> assert(statemodelDefinition.isValid()); 
> 
> return statemodelDefinition; 
> } 
> 
> I have tried reversing the values of the transition priorities. In this 
> case, the controller log file looked as follows: 
> 
> 2014-04-09 11:17:52,831 (Thread-2) TaskAssignmentStage INFO: Sending 
> Message 2b29a319-c1c6-4042-b1ad-3e3c1b5092a7 to 
> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE 
> to:BOOTSTRAP 
> 
> 2014-04-09 11:17:55,672 (Thread-2) MessageGenerationStage INFO: Message 
> hasn't been removed for pf1.apps-pf.dev.docker_12000 to 
> transitNEWPROFILE_5 to BOOTSTRAP, desiredState: MASTER 
> 
> 2014-04-09 11:17:57,047 (Thread-2) TaskAssignmentStage INFO: Sending 
> Message b1ca701d-65f1-46b9-9ae4-286400d6d266 to 
> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP 
> to:SLAVE 
> 
> 2014-04-09 11:17:58,888 (Thread-2) TaskAssignmentStage INFO: Sending 
> Message fe10228f-8f5b-4133-964a-5f6c7e60b0e6 to 
> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE 
> to:MASTER 
> 
> 2014-04-09 11:23:26,117 (Thread-2) TaskAssignmentStage INFO: Sending 
> Message 6252a4e6-0ab8-490a-a51d-c47195c434b5 to 
> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:MASTER 
> to:SLAVE 
> 
> 2014-04-09 11:23:26,117 (Thread-2) TaskAssignmentStage INFO: Sending 
> Message 18bbf028-cb51-4162-8226-a6564a121986 to 
> pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE 
> to:BOOTSTRAP 
> 
> 2014-04-09 11:23:33,462 (Thread-2) MessageGenerationStage INFO: Message 
> hasn't been removed for pf2.apps-pf.dev.docker_12000 to 
> transitNEWPROFILE_5 to BOOTSTRAP, desiredState: MASTER 
> 
> 2014-04-09 11:23:33,892 (Thread-2) TaskAssignmentStage INFO: Sending 
> Message c7fc4983-9d71-4dc4-bfee-2ad69e4de411 to 
> pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP 
> to:SLAVE 
> 
> 2014-04-09 11:23:35,933 (Thread-2) TaskAssignmentStage INFO: Sending 
> Message 75e715ed-3d53-4e39-b1e7-44695e4bfa03 to 
> pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE 
> to:MASTER 
> 
> That is, the transition for master->slave for pf1 was executed before 
> taking any action on pf2, clearly the opposite of the right order. 
> 
> 
> On Tue, Apr 8, 2014 at 2:19 PM, Kanak Biscuitwala 
> <ka...@hotmail.com>> wrote: 
> 
> Looks good, thanks for sharing! 
> 
> Kanak 
> ________________________________ 
>> Date: Tue, 8 Apr 2014 14:08:28 -0700 
>> Subject: Re: keeping the master node up during bootstrap 
>> From: vlad.gm@gmail.com<ma...@gmail.com> 
>> To: user@helix.apache.org<ma...@helix.apache.org> 
>> 
>> My modified code looks like: 
>> 
>> /* Setup a Helix cluster for the KVStore */ 
>> public static void setupCluster() { 
>> assert(cluster != null); 
>> clusterSetup.addCluster(cluster, true); 
>> 
>> а а а а ConstraintItemBuilder constraintItemBuilder = new 
>> ConstraintItemBuilder(); 
>> 
>> а а а а constraintItemBuilder 
>> а а а а а а а а 
>> .addConstraintAttribute(ConstraintAttribute.MESSAGE_TYPE.toString(), 
>> "STATE_TRANSITION") 
>> а а а а а а а а 
>> .addConstraintAttribute(ConstraintAttribute.PARTITION.toString(), ".*") 
>> а а а а а а а а 
>> .addConstraintAttribute(ConstraintAttribute.CONSTRAINT_VALUE.toString(), 
>> "1"); 
>> 
>> а а а а clusterSetup.getClusterManagementTool().setConstraint(cluster, 
>> а а а а а а а а ClusterConstraints.ConstraintType.MESSAGE_CONSTRAINT, 
>> а а а а а а а а "constraint1", constraintItemBuilder.build()); 
>> а а } 
>> 
>> I will try to see whether it works in every situation. 
>> 
>> Regards, 
>> Vlad 
>> 
>> 
>> On Tue, Apr 8, 2014 at 8:59 AM, Vlad Balan 
>> 
> <vl...@gmail.com>>> 
> wrote: 
>> Hi Kishore, 
>> 
>> I managed to implement the bootstrapping using the constraint and it 
>> appears to be running as expected. I will post my code shortly. 
>> 
>> Regards, 
>> Vlad 
>> 
>> On Apr 8, 2014, at 8:27 AM, kishore g 
>> 
> <g....@gmail.com>>> 
> wrote: 
>> 
>> Hi Vlad, 
>> 
>> Did you get a chance to play with the constraint.а I can write a sample 
>> code today to try this. 
>> 
>> Thanks, 
>> Kishore G 
>> 
>> 
>> On Thu, Apr 3, 2014 at 5:45 PM, 
>> 
> vlad.gm@gmail.com<ma...@gmail.com>> 
>> 
> <vl...@gmail.com>>> 
> wrote: 
>> 
>> Thank you Kanak and Kishore! I will try enforcing the per-partition 
>> constraint and let you know if somehow it does not work. I was looking 
>> at the throttling documentation, but somehow missed that a 
>> per-partition constraint was an option! 
>> 
>> Regards, 
>> Vlad 
>> 
>> 
>> On Thu, Apr 3, 2014 at 5:42 PM, kishore g 
>> 
> <g....@gmail.com>>> 
> wrote: 
>> Hi Vlad, 
>> 
>> You can try setting the transition priority order and a constraint that 
>> there should be only one transition per partition across the cluster. 
>> 
>> So the transition priority could be something like 
>> 
>> Slave-Master 
>> Offfline -> Bootstrap 
>> Bootstrap->Slave 
>> Slave->Master 
>> 
>> For the rest not sure if order matters. 
>> 
>> Also set the max transitions constraint to 1 per partition. 
>> 
>> The reason I put Slave-Master before Offline->Bootstrap is to ensure 
>> that availability is given more importance. For example if you have 3 
>> nodes, N1, N2, N3. N1 is Master, N2 is Slave, and N3 is down. If N1 
>> goes down and N3 comes up at the same time. We probably dont want to 
>> wait for N3 to bootstrap before promoting N2 to Master. 
>> 
>> I haven't tested this but assuming the constraints enforcement works, 
>> this should do the trick. 
>> 
>> Does this make sense? Let me know if this does not work, we can add a 
>> test case. 
>> 
>> thanks, 
>> Kishore G 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Thu, Apr 3, 2014 at 4:57 PM, 
>> 
> vlad.gm@gmail.com<ma...@gmail.com>> 
>> 
> <vl...@gmail.com>>> 
> wrote: 
>> 
>> Dear all, 
>> 
>> I am trying to construct a state model with the following transition 
> diagram: 
>> 
>> OFFLINE -> BOOTSTRAPPING <---> SLAVE <-----> MASTER 
>> а а а а а<----------------------------------- 
>> 
>> That is, an offline mode can go into a bootstraping state, from the 
>> bootstrap state it can go into a slave state, 
>> from slave it can go from master, from master to slave and from slave 
>> it can go offline. 
>> 
>> Assume that if I have a partition with two nodes pf1 and pf2 and a 
>> partition partition_0 with the following ideal state: 
>> 
>> partition_0: pf2: MASTER pf1: SLAVE, 
>> 
>> and that currently pf1 is serving as a master. When pf2 boots, Helix 
>> will issue, almost simultaneously, two commands: 
>> for pf1: transition from MASTER to SLAVE 
>> for pf2: transition from BOOTSTRAPPING to SLAVE 
>> 
>> My understanding is that this happens since Helix is trying to execute 
>> as many commands in parallel and since the last state 
>> has pf2 as master. However, the transition from BOOTSTRAPPING to SLAVE 
>> for pf2 involves a long data copy step, so 
>> I would like to keep pf1 as a master in the meanwhile. I tried 
>> prioritizing the transition from BOOTSTRAPPING to SLAVE 
>> over the transition from MASTER to SLAVE, however Helix still issues 
>> them in parallel (as it should). 
>> 
>> I was wondering what my options would be in order to keep the master up 
>> while the future master is bootstrapping. Could 
>> a throttling in the number of transitions be enforced at partition 
>> level? Could I somehow specify that a state with a slave 
>> and a bootstrapping node is undesirable? 
>> 
>> As a note, I have also looked at the RSync-replicateed filesystem 
>> example. The reason for not using the OfflineOnline or the 
>> MasterSlave model in my application is that I would like the 
>> bootstrapping node to receive updates from clients, i.e. be visible 
>> during the bootstrap. For this reason, I am introducing the new 
>> BOOTSTRAPPING phase in-between OFFLINE and SLAVE. 
>> 
>> Regards, 
>> Vlad 
>> 
>> 
>> PS: The state model definition is as follows: 
>> 
>> builder.addState(MASTER, 1); а а а а а а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а 
>> 
>> а а а а а а builder.addState(SLAVE, 2);а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а 
>> 
>> а а а а а а builder.addState(BOOTSTRAP, 3);а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а 
>> 
>> а а а а а а builder.addState(OFFLINE); а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а 
>> 
>> а а а а а а builder.addState(DROPPED); а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а 
>> 
>> а а а а а а // Set the initial state when the node startsа а а а а а а 
>> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а 
>> 
>> а а а а а а builder.initialState(OFFLINE); а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а 
>> 
>> аа а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а 
>> 
>> а а а а а а // Add transitions between the states. а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а 
>> 
>> а а а а а а builder.addTransition(OFFLINE, BOOTSTRAP, 4);а а а а а а а 
>> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а 
>> 
>> а а а а а а builder.addTransition(BOOTSTRAP, SLAVE, 5);а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а 
>> 
>> а а а а а а builder.addTransition(SLAVE, MASTER, 6); а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а 
>> 
>> а а а а а а builder.addTransition(MASTER, SLAVE, 3); а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а 
>> 
>> а а а а а а builder.addTransition(SLAVE, OFFLINE, 2);а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а 
>> 
>> а а а а а а builder.addTransition(OFFLINE, DROPPED, 1);а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а 
>> 
>> аа а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а 
>> 
>> а а а а а а // set constraints on states.а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а 
>> 
>> а а а а а а // static constraint а а а а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а 
>> 
>> а а а а а а builder.upperBound(MASTER, 1); а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а 
>> 
>> а а а а а а // dynamic constraint, R means it should be derived based 
>> on the replication а а а а а а а а а а а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а а 
>> 
>> а а а а а а // factor. а а а а а а а а а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
>> а а а а а а а а а а а а а а а а а а 
>> 
>> а а а а а а builder.dynamicUpperBound(SLAVE, "R");а а а а а а а а а а а 
>> 
>> 
>> 
>> 
> 
> 
> 
>

Re: keeping the master node up during bootstrap

Posted by "vlad.gm@gmail.com" <vl...@gmail.com>.

Sure! I'll join the channel!


On Wed, Apr 9, 2014 at 12:41 PM, kishore g <g....@gmail.com> wrote:

> Hi Vlad,
>
> I have some questions. Can you join the IRC channel #apachehelix.
>
> thanks,
> Kishore G
>
>
> On Wed, Apr 9, 2014 at 11:35 AM, vlad.gm@gmail.com <vl...@gmail.com>wrote:
>
>> Upon some further testing, it seems that the controller does not execute
>> the events in the right sequence.
>>
>> Here are the results of some of my testing. Assume that we have a
>> partition NEWPROFILE_5 with the ideal state:
>>
>>  "NEWPROFILE_5" : {
>>
>>       "pf1.apps-pf.dev.docker_12000" : "SLAVE",
>>
>>       "pf2.apps-pf.dev.docker_12000" : "MASTER"
>>
>>     }
>>
>> I boot the host pf1 and a few minutes later the host pf2. In the
>> controller logs I see, when doing a grep for NEWPROFILE_5:
>>
>> 2014-04-08 17:04:35,309 (Thread-2) TaskAssignmentStage INFO: Sending
>> Message 69b4eddf-ac5f-4726-9d6b-bac742ad082e to
>> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE to:MASTER
>>
>> 2014-04-08 17:27:08,187 (Thread-2) TaskAssignmentStage INFO: Sending
>> Message a221b1ac-0807-425e-9062-6507e45b0bfb to
>> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE
>> to:BOOTSTRAP
>>
>> 2014-04-08 17:27:10,164 (Thread-2) TaskAssignmentStage INFO: Sending
>> Message 73ed85fd-49c9-46a5-b262-687d612c7d06 to
>> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP to:SLAVE
>>
>> 2014-04-08 17:27:11,868 (Thread-2) TaskAssignmentStage INFO: Sending
>> Message fb21aecc-68cf-4b9f-9718-aa6ed535c29d to
>> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE to:MASTER
>>
>> 2014-04-08 17:28:22,978 (Thread-2) TaskAssignmentStage INFO: Sending
>> Message ea441d18-b1f3-4ceb-96a2-3262cab1dfbe to
>> pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE
>> to:BOOTSTRAP
>>
>> 2014-04-08 17:28:22,978 (Thread-2) TaskAssignmentStage INFO: Sending
>> Message f36b4d64-c790-413b-b9fa-915b9539d28c to
>> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:MASTER to:SLAVE
>>
>> 2014-04-08 17:28:26,065 (Thread-2) TaskAssignmentStage INFO: Sending
>> Message 201429e1-e810-4017-b3ef-fb5930ac2192 to
>> pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP to:SLAVE
>>
>> 2014-04-08 17:28:28,238 (Thread-2) TaskAssignmentStage INFO: Sending
>> Message 4a1fb64c-1063-4e49-a995-946d2dd25733 to
>> pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE to:MASTER
>>
>> That is, the controller issues an offline->bootstrap command to pf-2, but
>> then issues a master->slave command to of-1 before bringing pf-2 up as a
>> slave as well (the last step before promotion to master). Since the
>> bootstrap->slave that follows takes time, the system spends time without a
>> master for the partition.
>>
>> The state model definition was:
>> public static StateModelDefinition defineStateModel() {
>> StateModelDefinition.Builder builder =
>>  new StateModelDefinition.Builder(KVHelixDefinitions.STATE_MODEL_NAME);
>> // Add states and their rank to indicate priority. Lower the rank higher
>> the
>>  // priority
>> builder.addState(MASTER, 1);
>> builder.addState(SLAVE, 2);
>>  builder.addState(BOOTSTRAP, 3);
>> builder.addState(OFFLINE);
>> builder.addState(DROPPED);
>>  // Set the initial state when the node starts
>> builder.initialState(OFFLINE);
>>
>> // Add transitions between the states.
>>  builder.addTransition(OFFLINE, BOOTSTRAP, 3);
>> builder.addTransition(BOOTSTRAP, SLAVE, 2);
>> builder.addTransition(SLAVE, MASTER, 1);
>>  builder.addTransition(MASTER, SLAVE, 4);
>> builder.addTransition(SLAVE, OFFLINE, 5);
>> builder.addTransition(OFFLINE, DROPPED, 6);
>>
>> // set constraints on states.
>> // static constraint
>> builder.upperBound(MASTER, 1);
>>  // dynamic constraint, R means it should be derived based on the
>> replication
>> // factor.
>>  builder.dynamicUpperBound(SLAVE, "R");
>>
>> StateModelDefinition statemodelDefinition = builder.build();
>>
>> assert(statemodelDefinition.isValid());
>>
>> return statemodelDefinition;
>> }
>>
>> I have tried reversing the values of the transition priorities. In this
>> case, the controller log file looked as follows:
>>
>> 2014-04-09 11:17:52,831 (Thread-2) TaskAssignmentStage INFO: Sending
>> Message 2b29a319-c1c6-4042-b1ad-3e3c1b5092a7 to
>> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE
>> to:BOOTSTRAP
>>
>> 2014-04-09 11:17:55,672 (Thread-2) MessageGenerationStage INFO: Message
>> hasn't been removed for pf1.apps-pf.dev.docker_12000 to transitNEWPROFILE_5
>> to BOOTSTRAP, desiredState: MASTER
>>
>> 2014-04-09 11:17:57,047 (Thread-2) TaskAssignmentStage INFO: Sending
>> Message b1ca701d-65f1-46b9-9ae4-286400d6d266 to
>> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP to:SLAVE
>>
>> 2014-04-09 11:17:58,888 (Thread-2) TaskAssignmentStage INFO: Sending
>> Message fe10228f-8f5b-4133-964a-5f6c7e60b0e6 to
>> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE to:MASTER
>>
>> 2014-04-09 11:23:26,117 (Thread-2) TaskAssignmentStage INFO: Sending
>> Message 6252a4e6-0ab8-490a-a51d-c47195c434b5 to
>> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:MASTER to:SLAVE
>>
>> 2014-04-09 11:23:26,117 (Thread-2) TaskAssignmentStage INFO: Sending
>> Message 18bbf028-cb51-4162-8226-a6564a121986 to
>> pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE
>> to:BOOTSTRAP
>>
>> 2014-04-09 11:23:33,462 (Thread-2) MessageGenerationStage INFO: Message
>> hasn't been removed for pf2.apps-pf.dev.docker_12000 to transitNEWPROFILE_5
>> to BOOTSTRAP, desiredState: MASTER
>>
>> 2014-04-09 11:23:33,892 (Thread-2) TaskAssignmentStage INFO: Sending
>> Message c7fc4983-9d71-4dc4-bfee-2ad69e4de411 to
>> pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP to:SLAVE
>>
>> 2014-04-09 11:23:35,933 (Thread-2) TaskAssignmentStage INFO: Sending
>> Message 75e715ed-3d53-4e39-b1e7-44695e4bfa03 to
>> pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE to:MASTER
>>
>> That is, the transition for master->slave for pf1 was executed before
>> taking any action on pf2, clearly the opposite of the right order.
>>
>>
>> On Tue, Apr 8, 2014 at 2:19 PM, Kanak Biscuitwala <ka...@hotmail.com>wrote:
>>
>>>
>>> Looks good, thanks for sharing!
>>>
>>> Kanak
>>> ________________________________
>>> > Date: Tue, 8 Apr 2014 14:08:28 -0700
>>> > Subject: Re: keeping the master node up during bootstrap
>>> > From: vlad.gm@gmail.com
>>> > To: user@helix.apache.org
>>> >
>>> > My modified code looks like:
>>> >
>>> > /* Setup a Helix cluster for the KVStore */
>>> > public static void setupCluster() {
>>> > assert(cluster != null);
>>> > clusterSetup.addCluster(cluster, true);
>>> >
>>> > а а а а ConstraintItemBuilder constraintItemBuilder = new
>>> > ConstraintItemBuilder();
>>> >
>>> > а а а а constraintItemBuilder
>>> > а а а а а а а а
>>> > .addConstraintAttribute(ConstraintAttribute.MESSAGE_TYPE.toString(),
>>> > "STATE_TRANSITION")
>>> > а а а а а а а а
>>> > .addConstraintAttribute(ConstraintAttribute.PARTITION.toString(), ".*")
>>> > а а а а а а а а
>>> >
>>> .addConstraintAttribute(ConstraintAttribute.CONSTRAINT_VALUE.toString(),
>>> > "1");
>>> >
>>> > а а а а clusterSetup.getClusterManagementTool().setConstraint(cluster,
>>> > а а а а а а а а ClusterConstraints.ConstraintType.MESSAGE_CONSTRAINT,
>>> > а а а а а а а а "constraint1", constraintItemBuilder.build());
>>> > а а }
>>> >
>>> > I will try to see whether it works in every situation.
>>> >
>>> > Regards,
>>> > Vlad
>>> >
>>> >
>>> > On Tue, Apr 8, 2014 at 8:59 AM, Vlad Balan
>>> > <vl...@gmail.com>> wrote:
>>> > Hi Kishore,
>>> >
>>> > I managed to implement the bootstrapping using the constraint and it
>>> > appears to be running as expected. I will post my code shortly.
>>> >
>>> > Regards,
>>> > Vlad
>>> >
>>> > On Apr 8, 2014, at 8:27 AM, kishore g
>>> > <g....@gmail.com>> wrote:
>>> >
>>> > Hi Vlad,
>>> >
>>> > Did you get a chance to play with the constraint.а I can write a sample
>>> > code today to try this.
>>> >
>>> > Thanks,
>>> > Kishore G
>>> >
>>> >
>>> > On Thu, Apr 3, 2014 at 5:45 PM,
>>> > vlad.gm@gmail.com<ma...@gmail.com>
>>> > <vl...@gmail.com>> wrote:
>>> >
>>> > Thank you Kanak and Kishore! I will try enforcing the per-partition
>>> > constraint and let you know if somehow it does not work. I was looking
>>> > at the throttling documentation, but somehow missed that a
>>> > per-partition constraint was an option!
>>> >
>>> > Regards,
>>> > Vlad
>>> >
>>> >
>>> > On Thu, Apr 3, 2014 at 5:42 PM, kishore g
>>> > <g....@gmail.com>> wrote:
>>> > Hi Vlad,
>>> >
>>> > You can try setting the transition priority order and a constraint that
>>> > there should be only one transition per partition across the cluster.
>>> >
>>> > So the transition priority could be something like
>>> >
>>> > Slave-Master
>>> > Offfline -> Bootstrap
>>> > Bootstrap->Slave
>>> > Slave->Master
>>> >
>>> > For the rest not sure if order matters.
>>> >
>>> > Also set the max transitions constraint to 1 per partition.
>>> >
>>> > The reason I put Slave-Master before Offline->Bootstrap is to ensure
>>> > that availability is given more importance. For example if you have 3
>>> > nodes, N1, N2, N3. N1 is Master, N2 is Slave, and N3 is down. If N1
>>> > goes down and N3 comes up at the same time. We probably dont want to
>>> > wait for N3 to bootstrap before promoting N2 to Master.
>>> >
>>> > I haven't tested this but assuming the constraints enforcement works,
>>> > this should do the trick.
>>> >
>>> > Does this make sense? Let me know if this does not work, we can add a
>>> > test case.
>>> >
>>> > thanks,
>>> > Kishore G
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Thu, Apr 3, 2014 at 4:57 PM,
>>> > vlad.gm@gmail.com<ma...@gmail.com>
>>> > <vl...@gmail.com>> wrote:
>>> >
>>> > Dear all,
>>> >
>>> > I am trying to construct a state model with the following transition
>>> diagram:
>>> >
>>> > OFFLINE -> BOOTSTRAPPING <---> SLAVE <-----> MASTER
>>> > а а а а а<-----------------------------------
>>> >
>>> > That is, an offline mode can go into a bootstraping state, from the
>>> > bootstrap state it can go into a slave state,
>>> > from slave it can go from master, from master to slave and from slave
>>> > it can go offline.
>>> >
>>> > Assume that if I have a partition with two nodes pf1 and pf2 and a
>>> > partition partition_0 with the following ideal state:
>>> >
>>> > partition_0: pf2: MASTER pf1: SLAVE,
>>> >
>>> > and that currently pf1 is serving as a master. When pf2 boots, Helix
>>> > will issue, almost simultaneously, two commands:
>>> > for pf1: transition from MASTER to SLAVE
>>> > for pf2: transition from BOOTSTRAPPING to SLAVE
>>> >
>>> > My understanding is that this happens since Helix is trying to execute
>>> > as many commands in parallel and since the last state
>>> > has pf2 as master. However, the transition from BOOTSTRAPPING to SLAVE
>>> > for pf2 involves a long data copy step, so
>>> > I would like to keep pf1 as a master in the meanwhile. I tried
>>> > prioritizing the transition from BOOTSTRAPPING to SLAVE
>>> > over the transition from MASTER to SLAVE, however Helix still issues
>>> > them in parallel (as it should).
>>> >
>>> > I was wondering what my options would be in order to keep the master up
>>> > while the future master is bootstrapping. Could
>>> > a throttling in the number of transitions be enforced at partition
>>> > level? Could I somehow specify that a state with a slave
>>> > and a bootstrapping node is undesirable?
>>> >
>>> > As a note, I have also looked at the RSync-replicateed filesystem
>>> > example. The reason for not using the OfflineOnline or the
>>> > MasterSlave model in my application is that I would like the
>>> > bootstrapping node to receive updates from clients, i.e. be visible
>>> > during the bootstrap. For this reason, I am introducing the new
>>> > BOOTSTRAPPING phase in-between OFFLINE and SLAVE.
>>> >
>>> > Regards,
>>> > Vlad
>>> >
>>> >
>>> > PS: The state model definition is as follows:
>>> >
>>> > builder.addState(MASTER, 1); а а а а а а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а
>>> >
>>> > а а а а а а builder.addState(SLAVE, 2);а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а
>>> >
>>> > а а а а а а builder.addState(BOOTSTRAP, 3);а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а
>>> >
>>> > а а а а а а builder.addState(OFFLINE); а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а
>>> >
>>> > а а а а а а builder.addState(DROPPED); а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а
>>> >
>>> > а а а а а а // Set the initial state when the node startsа а а а а а а
>>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а
>>> >
>>> > а а а а а а builder.initialState(OFFLINE); а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а
>>> >
>>> > аа а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а
>>> >
>>> > а а а а а а // Add transitions between the states. а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а
>>> >
>>> > а а а а а а builder.addTransition(OFFLINE, BOOTSTRAP, 4);а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а
>>> >
>>> > а а а а а а builder.addTransition(BOOTSTRAP, SLAVE, 5);а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а
>>> >
>>> > а а а а а а builder.addTransition(SLAVE, MASTER, 6); а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а
>>> >
>>> > а а а а а а builder.addTransition(MASTER, SLAVE, 3); а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а
>>> >
>>> > а а а а а а builder.addTransition(SLAVE, OFFLINE, 2);а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а
>>> >
>>> > а а а а а а builder.addTransition(OFFLINE, DROPPED, 1);а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а
>>> >
>>> > аа а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а
>>> >
>>> > а а а а а а // set constraints on states.а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а
>>> >
>>> > а а а а а а // static constraint а а а а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а
>>> >
>>> > а а а а а а builder.upperBound(MASTER, 1); а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а
>>> >
>>> > а а а а а а // dynamic constraint, R means it should be derived based
>>> > on the replication а а а а а а а а а а а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а а
>>> >
>>> > а а а а а а // factor. а а а а а а а а а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>>> > а а а а а а а а а а а а а а а а а а
>>> >
>>> > а а а а а а builder.dynamicUpperBound(SLAVE, "R");а а а а а а а а а а а
>>> >
>>> >
>>> >
>>> >
>>>
>>>
>>
>>
>

Re: keeping the master node up during bootstrap

Posted by kishore g <g....@gmail.com>.

Hi Vlad,

I have some questions. Can you join the IRC channel #apachehelix.

thanks,
Kishore G


On Wed, Apr 9, 2014 at 11:35 AM, vlad.gm@gmail.com <vl...@gmail.com>wrote:

> Upon some further testing, it seems that the controller does not execute
> the events in the right sequence.
>
> Here are the results of some of my testing. Assume that we have a
> partition NEWPROFILE_5 with the ideal state:
>
>  "NEWPROFILE_5" : {
>
>       "pf1.apps-pf.dev.docker_12000" : "SLAVE",
>
>       "pf2.apps-pf.dev.docker_12000" : "MASTER"
>
>     }
>
> I boot the host pf1 and a few minutes later the host pf2. In the
> controller logs I see, when doing a grep for NEWPROFILE_5:
>
> 2014-04-08 17:04:35,309 (Thread-2) TaskAssignmentStage INFO: Sending
> Message 69b4eddf-ac5f-4726-9d6b-bac742ad082e to
> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE to:MASTER
>
> 2014-04-08 17:27:08,187 (Thread-2) TaskAssignmentStage INFO: Sending
> Message a221b1ac-0807-425e-9062-6507e45b0bfb to
> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE
> to:BOOTSTRAP
>
> 2014-04-08 17:27:10,164 (Thread-2) TaskAssignmentStage INFO: Sending
> Message 73ed85fd-49c9-46a5-b262-687d612c7d06 to
> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP to:SLAVE
>
> 2014-04-08 17:27:11,868 (Thread-2) TaskAssignmentStage INFO: Sending
> Message fb21aecc-68cf-4b9f-9718-aa6ed535c29d to
> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE to:MASTER
>
> 2014-04-08 17:28:22,978 (Thread-2) TaskAssignmentStage INFO: Sending
> Message ea441d18-b1f3-4ceb-96a2-3262cab1dfbe to
> pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE
> to:BOOTSTRAP
>
> 2014-04-08 17:28:22,978 (Thread-2) TaskAssignmentStage INFO: Sending
> Message f36b4d64-c790-413b-b9fa-915b9539d28c to
> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:MASTER to:SLAVE
>
> 2014-04-08 17:28:26,065 (Thread-2) TaskAssignmentStage INFO: Sending
> Message 201429e1-e810-4017-b3ef-fb5930ac2192 to
> pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP to:SLAVE
>
> 2014-04-08 17:28:28,238 (Thread-2) TaskAssignmentStage INFO: Sending
> Message 4a1fb64c-1063-4e49-a995-946d2dd25733 to
> pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE to:MASTER
>
> That is, the controller issues an offline->bootstrap command to pf-2, but
> then issues a master->slave command to of-1 before bringing pf-2 up as a
> slave as well (the last step before promotion to master). Since the
> bootstrap->slave that follows takes time, the system spends time without a
> master for the partition.
>
> The state model definition was:
> public static StateModelDefinition defineStateModel() {
> StateModelDefinition.Builder builder =
>  new StateModelDefinition.Builder(KVHelixDefinitions.STATE_MODEL_NAME);
> // Add states and their rank to indicate priority. Lower the rank higher
> the
>  // priority
> builder.addState(MASTER, 1);
> builder.addState(SLAVE, 2);
>  builder.addState(BOOTSTRAP, 3);
> builder.addState(OFFLINE);
> builder.addState(DROPPED);
>  // Set the initial state when the node starts
> builder.initialState(OFFLINE);
>
> // Add transitions between the states.
>  builder.addTransition(OFFLINE, BOOTSTRAP, 3);
> builder.addTransition(BOOTSTRAP, SLAVE, 2);
> builder.addTransition(SLAVE, MASTER, 1);
>  builder.addTransition(MASTER, SLAVE, 4);
> builder.addTransition(SLAVE, OFFLINE, 5);
> builder.addTransition(OFFLINE, DROPPED, 6);
>
> // set constraints on states.
> // static constraint
> builder.upperBound(MASTER, 1);
>  // dynamic constraint, R means it should be derived based on the
> replication
> // factor.
> builder.dynamicUpperBound(SLAVE, "R");
>
> StateModelDefinition statemodelDefinition = builder.build();
>
> assert(statemodelDefinition.isValid());
>
> return statemodelDefinition;
> }
>
> I have tried reversing the values of the transition priorities. In this
> case, the controller log file looked as follows:
>
> 2014-04-09 11:17:52,831 (Thread-2) TaskAssignmentStage INFO: Sending
> Message 2b29a319-c1c6-4042-b1ad-3e3c1b5092a7 to
> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE
> to:BOOTSTRAP
>
> 2014-04-09 11:17:55,672 (Thread-2) MessageGenerationStage INFO: Message
> hasn't been removed for pf1.apps-pf.dev.docker_12000 to transitNEWPROFILE_5
> to BOOTSTRAP, desiredState: MASTER
>
> 2014-04-09 11:17:57,047 (Thread-2) TaskAssignmentStage INFO: Sending
> Message b1ca701d-65f1-46b9-9ae4-286400d6d266 to
> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP to:SLAVE
>
> 2014-04-09 11:17:58,888 (Thread-2) TaskAssignmentStage INFO: Sending
> Message fe10228f-8f5b-4133-964a-5f6c7e60b0e6 to
> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE to:MASTER
>
> 2014-04-09 11:23:26,117 (Thread-2) TaskAssignmentStage INFO: Sending
> Message 6252a4e6-0ab8-490a-a51d-c47195c434b5 to
> pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:MASTER to:SLAVE
>
> 2014-04-09 11:23:26,117 (Thread-2) TaskAssignmentStage INFO: Sending
> Message 18bbf028-cb51-4162-8226-a6564a121986 to
> pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE
> to:BOOTSTRAP
>
> 2014-04-09 11:23:33,462 (Thread-2) MessageGenerationStage INFO: Message
> hasn't been removed for pf2.apps-pf.dev.docker_12000 to transitNEWPROFILE_5
> to BOOTSTRAP, desiredState: MASTER
>
> 2014-04-09 11:23:33,892 (Thread-2) TaskAssignmentStage INFO: Sending
> Message c7fc4983-9d71-4dc4-bfee-2ad69e4de411 to
> pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP to:SLAVE
>
> 2014-04-09 11:23:35,933 (Thread-2) TaskAssignmentStage INFO: Sending
> Message 75e715ed-3d53-4e39-b1e7-44695e4bfa03 to
> pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE to:MASTER
>
> That is, the transition for master->slave for pf1 was executed before
> taking any action on pf2, clearly the opposite of the right order.
>
>
> On Tue, Apr 8, 2014 at 2:19 PM, Kanak Biscuitwala <ka...@hotmail.com>wrote:
>
>>
>> Looks good, thanks for sharing!
>>
>> Kanak
>> ________________________________
>> > Date: Tue, 8 Apr 2014 14:08:28 -0700
>> > Subject: Re: keeping the master node up during bootstrap
>> > From: vlad.gm@gmail.com
>> > To: user@helix.apache.org
>> >
>> > My modified code looks like:
>> >
>> > /* Setup a Helix cluster for the KVStore */
>> > public static void setupCluster() {
>> > assert(cluster != null);
>> > clusterSetup.addCluster(cluster, true);
>> >
>> > а а а а ConstraintItemBuilder constraintItemBuilder = new
>> > ConstraintItemBuilder();
>> >
>> > а а а а constraintItemBuilder
>> > а а а а а а а а
>> > .addConstraintAttribute(ConstraintAttribute.MESSAGE_TYPE.toString(),
>> > "STATE_TRANSITION")
>> > а а а а а а а а
>> > .addConstraintAttribute(ConstraintAttribute.PARTITION.toString(), ".*")
>> > а а а а а а а а
>> > .addConstraintAttribute(ConstraintAttribute.CONSTRAINT_VALUE.toString(),
>> > "1");
>> >
>> > а а а а clusterSetup.getClusterManagementTool().setConstraint(cluster,
>> > а а а а а а а а ClusterConstraints.ConstraintType.MESSAGE_CONSTRAINT,
>> > а а а а а а а а "constraint1", constraintItemBuilder.build());
>> > а а }
>> >
>> > I will try to see whether it works in every situation.
>> >
>> > Regards,
>> > Vlad
>> >
>> >
>> > On Tue, Apr 8, 2014 at 8:59 AM, Vlad Balan
>> > <vl...@gmail.com>> wrote:
>> > Hi Kishore,
>> >
>> > I managed to implement the bootstrapping using the constraint and it
>> > appears to be running as expected. I will post my code shortly.
>> >
>> > Regards,
>> > Vlad
>> >
>> > On Apr 8, 2014, at 8:27 AM, kishore g
>> > <g....@gmail.com>> wrote:
>> >
>> > Hi Vlad,
>> >
>> > Did you get a chance to play with the constraint.а I can write a sample
>> > code today to try this.
>> >
>> > Thanks,
>> > Kishore G
>> >
>> >
>> > On Thu, Apr 3, 2014 at 5:45 PM,
>> > vlad.gm@gmail.com<ma...@gmail.com>
>> > <vl...@gmail.com>> wrote:
>> >
>> > Thank you Kanak and Kishore! I will try enforcing the per-partition
>> > constraint and let you know if somehow it does not work. I was looking
>> > at the throttling documentation, but somehow missed that a
>> > per-partition constraint was an option!
>> >
>> > Regards,
>> > Vlad
>> >
>> >
>> > On Thu, Apr 3, 2014 at 5:42 PM, kishore g
>> > <g....@gmail.com>> wrote:
>> > Hi Vlad,
>> >
>> > You can try setting the transition priority order and a constraint that
>> > there should be only one transition per partition across the cluster.
>> >
>> > So the transition priority could be something like
>> >
>> > Slave-Master
>> > Offfline -> Bootstrap
>> > Bootstrap->Slave
>> > Slave->Master
>> >
>> > For the rest not sure if order matters.
>> >
>> > Also set the max transitions constraint to 1 per partition.
>> >
>> > The reason I put Slave-Master before Offline->Bootstrap is to ensure
>> > that availability is given more importance. For example if you have 3
>> > nodes, N1, N2, N3. N1 is Master, N2 is Slave, and N3 is down. If N1
>> > goes down and N3 comes up at the same time. We probably dont want to
>> > wait for N3 to bootstrap before promoting N2 to Master.
>> >
>> > I haven't tested this but assuming the constraints enforcement works,
>> > this should do the trick.
>> >
>> > Does this make sense? Let me know if this does not work, we can add a
>> > test case.
>> >
>> > thanks,
>> > Kishore G
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Thu, Apr 3, 2014 at 4:57 PM,
>> > vlad.gm@gmail.com<ma...@gmail.com>
>> > <vl...@gmail.com>> wrote:
>> >
>> > Dear all,
>> >
>> > I am trying to construct a state model with the following transition
>> diagram:
>> >
>> > OFFLINE -> BOOTSTRAPPING <---> SLAVE <-----> MASTER
>> > а а а а а<-----------------------------------
>> >
>> > That is, an offline mode can go into a bootstraping state, from the
>> > bootstrap state it can go into a slave state,
>> > from slave it can go from master, from master to slave and from slave
>> > it can go offline.
>> >
>> > Assume that if I have a partition with two nodes pf1 and pf2 and a
>> > partition partition_0 with the following ideal state:
>> >
>> > partition_0: pf2: MASTER pf1: SLAVE,
>> >
>> > and that currently pf1 is serving as a master. When pf2 boots, Helix
>> > will issue, almost simultaneously, two commands:
>> > for pf1: transition from MASTER to SLAVE
>> > for pf2: transition from BOOTSTRAPPING to SLAVE
>> >
>> > My understanding is that this happens since Helix is trying to execute
>> > as many commands in parallel and since the last state
>> > has pf2 as master. However, the transition from BOOTSTRAPPING to SLAVE
>> > for pf2 involves a long data copy step, so
>> > I would like to keep pf1 as a master in the meanwhile. I tried
>> > prioritizing the transition from BOOTSTRAPPING to SLAVE
>> > over the transition from MASTER to SLAVE, however Helix still issues
>> > them in parallel (as it should).
>> >
>> > I was wondering what my options would be in order to keep the master up
>> > while the future master is bootstrapping. Could
>> > a throttling in the number of transitions be enforced at partition
>> > level? Could I somehow specify that a state with a slave
>> > and a bootstrapping node is undesirable?
>> >
>> > As a note, I have also looked at the RSync-replicateed filesystem
>> > example. The reason for not using the OfflineOnline or the
>> > MasterSlave model in my application is that I would like the
>> > bootstrapping node to receive updates from clients, i.e. be visible
>> > during the bootstrap. For this reason, I am introducing the new
>> > BOOTSTRAPPING phase in-between OFFLINE and SLAVE.
>> >
>> > Regards,
>> > Vlad
>> >
>> >
>> > PS: The state model definition is as follows:
>> >
>> > builder.addState(MASTER, 1); а а а а а а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а
>> >
>> > а а а а а а builder.addState(SLAVE, 2);а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а
>> >
>> > а а а а а а builder.addState(BOOTSTRAP, 3);а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а
>> >
>> > а а а а а а builder.addState(OFFLINE); а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а
>> >
>> > а а а а а а builder.addState(DROPPED); а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а
>> >
>> > а а а а а а // Set the initial state when the node startsа а а а а а а
>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а
>> >
>> > а а а а а а builder.initialState(OFFLINE); а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а
>> >
>> > аа а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а
>> >
>> > а а а а а а // Add transitions between the states. а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а
>> >
>> > а а а а а а builder.addTransition(OFFLINE, BOOTSTRAP, 4);а а а а а а а
>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а
>> >
>> > а а а а а а builder.addTransition(BOOTSTRAP, SLAVE, 5);а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а
>> >
>> > а а а а а а builder.addTransition(SLAVE, MASTER, 6); а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а
>> >
>> > а а а а а а builder.addTransition(MASTER, SLAVE, 3); а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а
>> >
>> > а а а а а а builder.addTransition(SLAVE, OFFLINE, 2);а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а
>> >
>> > а а а а а а builder.addTransition(OFFLINE, DROPPED, 1);а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а
>> >
>> > аа а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а
>> >
>> > а а а а а а // set constraints on states.а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а
>> >
>> > а а а а а а // static constraint а а а а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а
>> >
>> > а а а а а а builder.upperBound(MASTER, 1); а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а
>> >
>> > а а а а а а // dynamic constraint, R means it should be derived based
>> > on the replication а а а а а а а а а а а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а а
>> >
>> > а а а а а а // factor. а а а а а а а а а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
>> > а а а а а а а а а а а а а а а а а а
>> >
>> > а а а а а а builder.dynamicUpperBound(SLAVE, "R");а а а а а а а а а а а
>> >
>> >
>> >
>> >
>>
>>
>
>

Re: keeping the master node up during bootstrap

Posted by "vlad.gm@gmail.com" <vl...@gmail.com>.

Upon some further testing, it seems that the controller does not execute
the events in the right sequence.

Here are the results of some of my testing. Assume that we have a partition
NEWPROFILE_5 with the ideal state:

 "NEWPROFILE_5" : {

      "pf1.apps-pf.dev.docker_12000" : "SLAVE",

      "pf2.apps-pf.dev.docker_12000" : "MASTER"

    }

I boot the host pf1 and a few minutes later the host pf2. In the controller
logs I see, when doing a grep for NEWPROFILE_5:

2014-04-08 17:04:35,309 (Thread-2) TaskAssignmentStage INFO: Sending
Message 69b4eddf-ac5f-4726-9d6b-bac742ad082e to
pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE to:MASTER

2014-04-08 17:27:08,187 (Thread-2) TaskAssignmentStage INFO: Sending
Message a221b1ac-0807-425e-9062-6507e45b0bfb to
pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE
to:BOOTSTRAP

2014-04-08 17:27:10,164 (Thread-2) TaskAssignmentStage INFO: Sending
Message 73ed85fd-49c9-46a5-b262-687d612c7d06 to
pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP to:SLAVE

2014-04-08 17:27:11,868 (Thread-2) TaskAssignmentStage INFO: Sending
Message fb21aecc-68cf-4b9f-9718-aa6ed535c29d to
pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE to:MASTER

2014-04-08 17:28:22,978 (Thread-2) TaskAssignmentStage INFO: Sending
Message ea441d18-b1f3-4ceb-96a2-3262cab1dfbe to
pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE
to:BOOTSTRAP

2014-04-08 17:28:22,978 (Thread-2) TaskAssignmentStage INFO: Sending
Message f36b4d64-c790-413b-b9fa-915b9539d28c to
pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:MASTER to:SLAVE

2014-04-08 17:28:26,065 (Thread-2) TaskAssignmentStage INFO: Sending
Message 201429e1-e810-4017-b3ef-fb5930ac2192 to
pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP to:SLAVE

2014-04-08 17:28:28,238 (Thread-2) TaskAssignmentStage INFO: Sending
Message 4a1fb64c-1063-4e49-a995-946d2dd25733 to
pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE to:MASTER

That is, the controller issues an offline->bootstrap command to pf-2, but
then issues a master->slave command to of-1 before bringing pf-2 up as a
slave as well (the last step before promotion to master). Since the
bootstrap->slave that follows takes time, the system spends time without a
master for the partition.

The state model definition was:
public static StateModelDefinition defineStateModel() {
StateModelDefinition.Builder builder =
new StateModelDefinition.Builder(KVHelixDefinitions.STATE_MODEL_NAME);
// Add states and their rank to indicate priority. Lower the rank higher the
// priority
builder.addState(MASTER, 1);
builder.addState(SLAVE, 2);
builder.addState(BOOTSTRAP, 3);
builder.addState(OFFLINE);
builder.addState(DROPPED);
// Set the initial state when the node starts
builder.initialState(OFFLINE);

// Add transitions between the states.
builder.addTransition(OFFLINE, BOOTSTRAP, 3);
builder.addTransition(BOOTSTRAP, SLAVE, 2);
builder.addTransition(SLAVE, MASTER, 1);
builder.addTransition(MASTER, SLAVE, 4);
builder.addTransition(SLAVE, OFFLINE, 5);
builder.addTransition(OFFLINE, DROPPED, 6);

// set constraints on states.
// static constraint
builder.upperBound(MASTER, 1);
// dynamic constraint, R means it should be derived based on the replication
// factor.
builder.dynamicUpperBound(SLAVE, "R");

StateModelDefinition statemodelDefinition = builder.build();

assert(statemodelDefinition.isValid());

return statemodelDefinition;
}

I have tried reversing the values of the transition priorities. In this
case, the controller log file looked as follows:

2014-04-09 11:17:52,831 (Thread-2) TaskAssignmentStage INFO: Sending
Message 2b29a319-c1c6-4042-b1ad-3e3c1b5092a7 to
pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE
to:BOOTSTRAP

2014-04-09 11:17:55,672 (Thread-2) MessageGenerationStage INFO: Message
hasn't been removed for pf1.apps-pf.dev.docker_12000 to transitNEWPROFILE_5
to BOOTSTRAP, desiredState: MASTER

2014-04-09 11:17:57,047 (Thread-2) TaskAssignmentStage INFO: Sending
Message b1ca701d-65f1-46b9-9ae4-286400d6d266 to
pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP to:SLAVE

2014-04-09 11:17:58,888 (Thread-2) TaskAssignmentStage INFO: Sending
Message fe10228f-8f5b-4133-964a-5f6c7e60b0e6 to
pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE to:MASTER

2014-04-09 11:23:26,117 (Thread-2) TaskAssignmentStage INFO: Sending
Message 6252a4e6-0ab8-490a-a51d-c47195c434b5 to
pf1.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:MASTER to:SLAVE

2014-04-09 11:23:26,117 (Thread-2) TaskAssignmentStage INFO: Sending
Message 18bbf028-cb51-4162-8226-a6564a121986 to
pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:OFFLINE
to:BOOTSTRAP

2014-04-09 11:23:33,462 (Thread-2) MessageGenerationStage INFO: Message
hasn't been removed for pf2.apps-pf.dev.docker_12000 to transitNEWPROFILE_5
to BOOTSTRAP, desiredState: MASTER

2014-04-09 11:23:33,892 (Thread-2) TaskAssignmentStage INFO: Sending
Message c7fc4983-9d71-4dc4-bfee-2ad69e4de411 to
pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:BOOTSTRAP to:SLAVE

2014-04-09 11:23:35,933 (Thread-2) TaskAssignmentStage INFO: Sending
Message 75e715ed-3d53-4e39-b1e7-44695e4bfa03 to
pf2.apps-pf.dev.docker_12000 transit NEWPROFILE_5|[] from:SLAVE to:MASTER

That is, the transition for master->slave for pf1 was executed before
taking any action on pf2, clearly the opposite of the right order.


On Tue, Apr 8, 2014 at 2:19 PM, Kanak Biscuitwala <ka...@hotmail.com>wrote:

>
> Looks good, thanks for sharing!
>
> Kanak
> ________________________________
> > Date: Tue, 8 Apr 2014 14:08:28 -0700
> > Subject: Re: keeping the master node up during bootstrap
> > From: vlad.gm@gmail.com
> > To: user@helix.apache.org
> >
> > My modified code looks like:
> >
> > /* Setup a Helix cluster for the KVStore */
> > public static void setupCluster() {
> > assert(cluster != null);
> > clusterSetup.addCluster(cluster, true);
> >
> > а а а а ConstraintItemBuilder constraintItemBuilder = new
> > ConstraintItemBuilder();
> >
> > а а а а constraintItemBuilder
> > а а а а а а а а
> > .addConstraintAttribute(ConstraintAttribute.MESSAGE_TYPE.toString(),
> > "STATE_TRANSITION")
> > а а а а а а а а
> > .addConstraintAttribute(ConstraintAttribute.PARTITION.toString(), ".*")
> > а а а а а а а а
> > .addConstraintAttribute(ConstraintAttribute.CONSTRAINT_VALUE.toString(),
> > "1");
> >
> > а а а а clusterSetup.getClusterManagementTool().setConstraint(cluster,
> > а а а а а а а а ClusterConstraints.ConstraintType.MESSAGE_CONSTRAINT,
> > а а а а а а а а "constraint1", constraintItemBuilder.build());
> > а а }
> >
> > I will try to see whether it works in every situation.
> >
> > Regards,
> > Vlad
> >
> >
> > On Tue, Apr 8, 2014 at 8:59 AM, Vlad Balan
> > <vl...@gmail.com>> wrote:
> > Hi Kishore,
> >
> > I managed to implement the bootstrapping using the constraint and it
> > appears to be running as expected. I will post my code shortly.
> >
> > Regards,
> > Vlad
> >
> > On Apr 8, 2014, at 8:27 AM, kishore g
> > <g....@gmail.com>> wrote:
> >
> > Hi Vlad,
> >
> > Did you get a chance to play with the constraint.а I can write a sample
> > code today to try this.
> >
> > Thanks,
> > Kishore G
> >
> >
> > On Thu, Apr 3, 2014 at 5:45 PM,
> > vlad.gm@gmail.com<ma...@gmail.com>
> > <vl...@gmail.com>> wrote:
> >
> > Thank you Kanak and Kishore! I will try enforcing the per-partition
> > constraint and let you know if somehow it does not work. I was looking
> > at the throttling documentation, but somehow missed that a
> > per-partition constraint was an option!
> >
> > Regards,
> > Vlad
> >
> >
> > On Thu, Apr 3, 2014 at 5:42 PM, kishore g
> > <g....@gmail.com>> wrote:
> > Hi Vlad,
> >
> > You can try setting the transition priority order and a constraint that
> > there should be only one transition per partition across the cluster.
> >
> > So the transition priority could be something like
> >
> > Slave-Master
> > Offfline -> Bootstrap
> > Bootstrap->Slave
> > Slave->Master
> >
> > For the rest not sure if order matters.
> >
> > Also set the max transitions constraint to 1 per partition.
> >
> > The reason I put Slave-Master before Offline->Bootstrap is to ensure
> > that availability is given more importance. For example if you have 3
> > nodes, N1, N2, N3. N1 is Master, N2 is Slave, and N3 is down. If N1
> > goes down and N3 comes up at the same time. We probably dont want to
> > wait for N3 to bootstrap before promoting N2 to Master.
> >
> > I haven't tested this but assuming the constraints enforcement works,
> > this should do the trick.
> >
> > Does this make sense? Let me know if this does not work, we can add a
> > test case.
> >
> > thanks,
> > Kishore G
> >
> >
> >
> >
> >
> >
> > On Thu, Apr 3, 2014 at 4:57 PM,
> > vlad.gm@gmail.com<ma...@gmail.com>
> > <vl...@gmail.com>> wrote:
> >
> > Dear all,
> >
> > I am trying to construct a state model with the following transition
> diagram:
> >
> > OFFLINE -> BOOTSTRAPPING <---> SLAVE <-----> MASTER
> > а а а а а<-----------------------------------
> >
> > That is, an offline mode can go into a bootstraping state, from the
> > bootstrap state it can go into a slave state,
> > from slave it can go from master, from master to slave and from slave
> > it can go offline.
> >
> > Assume that if I have a partition with two nodes pf1 and pf2 and a
> > partition partition_0 with the following ideal state:
> >
> > partition_0: pf2: MASTER pf1: SLAVE,
> >
> > and that currently pf1 is serving as a master. When pf2 boots, Helix
> > will issue, almost simultaneously, two commands:
> > for pf1: transition from MASTER to SLAVE
> > for pf2: transition from BOOTSTRAPPING to SLAVE
> >
> > My understanding is that this happens since Helix is trying to execute
> > as many commands in parallel and since the last state
> > has pf2 as master. However, the transition from BOOTSTRAPPING to SLAVE
> > for pf2 involves a long data copy step, so
> > I would like to keep pf1 as a master in the meanwhile. I tried
> > prioritizing the transition from BOOTSTRAPPING to SLAVE
> > over the transition from MASTER to SLAVE, however Helix still issues
> > them in parallel (as it should).
> >
> > I was wondering what my options would be in order to keep the master up
> > while the future master is bootstrapping. Could
> > a throttling in the number of transitions be enforced at partition
> > level? Could I somehow specify that a state with a slave
> > and a bootstrapping node is undesirable?
> >
> > As a note, I have also looked at the RSync-replicateed filesystem
> > example. The reason for not using the OfflineOnline or the
> > MasterSlave model in my application is that I would like the
> > bootstrapping node to receive updates from clients, i.e. be visible
> > during the bootstrap. For this reason, I am introducing the new
> > BOOTSTRAPPING phase in-between OFFLINE and SLAVE.
> >
> > Regards,
> > Vlad
> >
> >
> > PS: The state model definition is as follows:
> >
> > builder.addState(MASTER, 1); а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а
> >
> > а а а а а а builder.addState(SLAVE, 2);а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а builder.addState(BOOTSTRAP, 3);а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а builder.addState(OFFLINE); а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а builder.addState(DROPPED); а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а // Set the initial state when the node startsа а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а builder.initialState(OFFLINE); а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > аа а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а // Add transitions between the states. а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а builder.addTransition(OFFLINE, BOOTSTRAP, 4);а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а builder.addTransition(BOOTSTRAP, SLAVE, 5);а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а builder.addTransition(SLAVE, MASTER, 6); а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а builder.addTransition(MASTER, SLAVE, 3); а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а builder.addTransition(SLAVE, OFFLINE, 2);а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а builder.addTransition(OFFLINE, DROPPED, 1);а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > аа а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а // set constraints on states.а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а // static constraint а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а builder.upperBound(MASTER, 1); а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а // dynamic constraint, R means it should be derived based
> > on the replication а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а // factor. а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а builder.dynamicUpperBound(SLAVE, "R");а а а а а а а а а а а
> >
> >
> >
> >
>
>

RE: keeping the master node up during bootstrap

Posted by Kanak Biscuitwala <ka...@hotmail.com>.

Looks good, thanks for sharing!

Kanak
________________________________
> Date: Tue, 8 Apr 2014 14:08:28 -0700 
> Subject: Re: keeping the master node up during bootstrap 
> From: vlad.gm@gmail.com 
> To: user@helix.apache.org 
> 
> My modified code looks like: 
> 
> /* Setup a Helix cluster for the KVStore */ 
> public static void setupCluster() { 
> assert(cluster != null); 
> clusterSetup.addCluster(cluster, true); 
> 
> а а а а ConstraintItemBuilder constraintItemBuilder = new 
> ConstraintItemBuilder(); 
> 
> а а а а constraintItemBuilder 
> а а а а а а а а 
> .addConstraintAttribute(ConstraintAttribute.MESSAGE_TYPE.toString(), 
> "STATE_TRANSITION") 
> а а а а а а а а 
> .addConstraintAttribute(ConstraintAttribute.PARTITION.toString(), ".*") 
> а а а а а а а а 
> .addConstraintAttribute(ConstraintAttribute.CONSTRAINT_VALUE.toString(), 
> "1"); 
> 
> а а а а clusterSetup.getClusterManagementTool().setConstraint(cluster, 
> а а а а а а а а ClusterConstraints.ConstraintType.MESSAGE_CONSTRAINT, 
> а а а а а а а а "constraint1", constraintItemBuilder.build()); 
> а а } 
> 
> I will try to see whether it works in every situation. 
> 
> Regards, 
> Vlad 
> 
> 
> On Tue, Apr 8, 2014 at 8:59 AM, Vlad Balan 
> <vl...@gmail.com>> wrote: 
> Hi Kishore, 
> 
> I managed to implement the bootstrapping using the constraint and it 
> appears to be running as expected. I will post my code shortly. 
> 
> Regards, 
> Vlad 
> 
> On Apr 8, 2014, at 8:27 AM, kishore g 
> <g....@gmail.com>> wrote: 
> 
> Hi Vlad, 
> 
> Did you get a chance to play with the constraint.а I can write a sample 
> code today to try this. 
> 
> Thanks, 
> Kishore G 
> 
> 
> On Thu, Apr 3, 2014 at 5:45 PM, 
> vlad.gm@gmail.com<ma...@gmail.com> 
> <vl...@gmail.com>> wrote: 
> 
> Thank you Kanak and Kishore! I will try enforcing the per-partition 
> constraint and let you know if somehow it does not work. I was looking 
> at the throttling documentation, but somehow missed that a 
> per-partition constraint was an option! 
> 
> Regards, 
> Vlad 
> 
> 
> On Thu, Apr 3, 2014 at 5:42 PM, kishore g 
> <g....@gmail.com>> wrote: 
> Hi Vlad, 
> 
> You can try setting the transition priority order and a constraint that 
> there should be only one transition per partition across the cluster. 
> 
> So the transition priority could be something like 
> 
> Slave-Master 
> Offfline -> Bootstrap 
> Bootstrap->Slave 
> Slave->Master 
> 
> For the rest not sure if order matters. 
> 
> Also set the max transitions constraint to 1 per partition. 
> 
> The reason I put Slave-Master before Offline->Bootstrap is to ensure 
> that availability is given more importance. For example if you have 3 
> nodes, N1, N2, N3. N1 is Master, N2 is Slave, and N3 is down. If N1 
> goes down and N3 comes up at the same time. We probably dont want to 
> wait for N3 to bootstrap before promoting N2 to Master. 
> 
> I haven't tested this but assuming the constraints enforcement works, 
> this should do the trick. 
> 
> Does this make sense? Let me know if this does not work, we can add a 
> test case. 
> 
> thanks, 
> Kishore G 
> 
> 
> 
> 
> 
> 
> On Thu, Apr 3, 2014 at 4:57 PM, 
> vlad.gm@gmail.com<ma...@gmail.com> 
> <vl...@gmail.com>> wrote: 
> 
> Dear all, 
> 
> I am trying to construct a state model with the following transition diagram: 
> 
> OFFLINE -> BOOTSTRAPPING <---> SLAVE <-----> MASTER 
> а а а а а<----------------------------------- 
> 
> That is, an offline mode can go into a bootstraping state, from the 
> bootstrap state it can go into a slave state, 
> from slave it can go from master, from master to slave and from slave 
> it can go offline. 
> 
> Assume that if I have a partition with two nodes pf1 and pf2 and a 
> partition partition_0 with the following ideal state: 
> 
> partition_0: pf2: MASTER pf1: SLAVE, 
> 
> and that currently pf1 is serving as a master. When pf2 boots, Helix 
> will issue, almost simultaneously, two commands: 
> for pf1: transition from MASTER to SLAVE 
> for pf2: transition from BOOTSTRAPPING to SLAVE 
> 
> My understanding is that this happens since Helix is trying to execute 
> as many commands in parallel and since the last state 
> has pf2 as master. However, the transition from BOOTSTRAPPING to SLAVE 
> for pf2 involves a long data copy step, so 
> I would like to keep pf1 as a master in the meanwhile. I tried 
> prioritizing the transition from BOOTSTRAPPING to SLAVE 
> over the transition from MASTER to SLAVE, however Helix still issues 
> them in parallel (as it should). 
> 
> I was wondering what my options would be in order to keep the master up 
> while the future master is bootstrapping. Could 
> a throttling in the number of transitions be enforced at partition 
> level? Could I somehow specify that a state with a slave 
> and a bootstrapping node is undesirable? 
> 
> As a note, I have also looked at the RSync-replicateed filesystem 
> example. The reason for not using the OfflineOnline or the 
> MasterSlave model in my application is that I would like the 
> bootstrapping node to receive updates from clients, i.e. be visible 
> during the bootstrap. For this reason, I am introducing the new 
> BOOTSTRAPPING phase in-between OFFLINE and SLAVE. 
> 
> Regards, 
> Vlad 
> 
> 
> PS: The state model definition is as follows: 
> 
> builder.addState(MASTER, 1); а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а 
> 
> а а а а а а builder.addState(SLAVE, 2);а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а builder.addState(BOOTSTRAP, 3);а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а builder.addState(OFFLINE); а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а builder.addState(DROPPED); а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а // Set the initial state when the node startsа а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а builder.initialState(OFFLINE); а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> аа а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а // Add transitions between the states. а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а builder.addTransition(OFFLINE, BOOTSTRAP, 4);а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а builder.addTransition(BOOTSTRAP, SLAVE, 5);а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а builder.addTransition(SLAVE, MASTER, 6); а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а builder.addTransition(MASTER, SLAVE, 3); а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а builder.addTransition(SLAVE, OFFLINE, 2);а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а builder.addTransition(OFFLINE, DROPPED, 1);а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> аа а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а // set constraints on states.а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а // static constraint а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а builder.upperBound(MASTER, 1); а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а // dynamic constraint, R means it should be derived based 
> on the replication а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а // factor. а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а builder.dynamicUpperBound(SLAVE, "R");а а а а а а а а а а а 
> 
> 
> 
>

Re: keeping the master node up during bootstrap

Posted by "vlad.gm@gmail.com" <vl...@gmail.com>.

My modified code looks like:

/* Setup a Helix cluster for the KVStore */
public static void setupCluster() {
assert(cluster != null);
clusterSetup.addCluster(cluster, true);

        ConstraintItemBuilder constraintItemBuilder = new
ConstraintItemBuilder();

        constraintItemBuilder

.addConstraintAttribute(ConstraintAttribute.MESSAGE_TYPE.toString(),
"STATE_TRANSITION")

.addConstraintAttribute(ConstraintAttribute.PARTITION.toString(), ".*")

.addConstraintAttribute(ConstraintAttribute.CONSTRAINT_VALUE.toString(),
"1");

        clusterSetup.getClusterManagementTool().setConstraint(cluster,
                ClusterConstraints.ConstraintType.MESSAGE_CONSTRAINT,
                "constraint1", constraintItemBuilder.build());
    }

I will try to see whether it works in every situation.

Regards,
Vlad


On Tue, Apr 8, 2014 at 8:59 AM, Vlad Balan <vl...@gmail.com> wrote:

> Hi Kishore,
>
> I managed to implement the bootstrapping using the constraint and it
> appears to be running as expected. I will post my code shortly.
>
> Regards,
> Vlad
>
> On Apr 8, 2014, at 8:27 AM, kishore g <g....@gmail.com> wrote:
>
> Hi Vlad,
>
> Did you get a chance to play with the constraint.  I can write a sample
> code today to try this.
>
> Thanks,
> Kishore G
>
>
> On Thu, Apr 3, 2014 at 5:45 PM, vlad.gm@gmail.com <vl...@gmail.com>wrote:
>
>>
>> Thank you Kanak and Kishore! I will try enforcing the per-partition
>> constraint and let you know if somehow it does not work. I was looking at
>> the throttling documentation, but somehow missed that a per-partition
>> constraint was an option!
>>
>> Regards,
>> Vlad
>>
>>
>> On Thu, Apr 3, 2014 at 5:42 PM, kishore g <g....@gmail.com> wrote:
>>
>>> Hi Vlad,
>>>
>>> You can try setting the transition priority order and a constraint that
>>> there should be only one transition per partition across the cluster.
>>>
>>> So the transition priority could be something like
>>>
>>> Slave-Master
>>> Offfline -> Bootstrap
>>> Bootstrap->Slave
>>> Slave->Master
>>>
>>> For the rest not sure if order matters.
>>>
>>> Also set the max transitions constraint to 1 per partition.
>>>
>>> The reason I put Slave-Master before Offline->Bootstrap is to ensure
>>> that availability is given more importance. For example if you have 3
>>> nodes, N1, N2, N3. N1 is Master, N2 is Slave, and N3 is down. If N1 goes
>>> down and N3 comes up at the same time. We probably dont want to wait for N3
>>> to bootstrap before promoting N2 to Master.
>>>
>>> I haven't tested this but assuming the constraints enforcement works,
>>> this should do the trick.
>>>
>>> Does this make sense? Let me know if this does not work, we can add a
>>> test case.
>>>
>>> thanks,
>>> Kishore G
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Apr 3, 2014 at 4:57 PM, vlad.gm@gmail.com <vl...@gmail.com>wrote:
>>>
>>>>
>>>> Dear all,
>>>>
>>>> I am trying to construct a state model with the following transition
>>>> diagram:
>>>>
>>>> OFFLINE -> BOOTSTRAPPING <---> SLAVE <-----> MASTER
>>>>          <-----------------------------------
>>>>
>>>> That is, an offline mode can go into a bootstraping state, from the
>>>> bootstrap state it can go into a slave state,
>>>> from slave it can go from master, from master to slave and from slave
>>>> it can go offline.
>>>>
>>>> Assume that if I have a partition with two nodes pf1 and pf2 and a
>>>> partition partition_0 with the following ideal state:
>>>>
>>>> partition_0: pf2: MASTER pf1: SLAVE,
>>>>
>>>> and that currently pf1 is serving as a master. When pf2 boots, Helix
>>>> will issue, almost simultaneously, two commands:
>>>> for pf1: transition from MASTER to SLAVE
>>>> for pf2: transition from BOOTSTRAPPING to SLAVE
>>>>
>>>> My understanding is that this happens since Helix is trying to execute
>>>> as many commands in parallel and since the last state
>>>> has pf2 as master. However, the transition from BOOTSTRAPPING to SLAVE
>>>> for pf2 involves a long data copy step, so
>>>> I would like to keep pf1 as a master in the meanwhile. I tried
>>>> prioritizing the transition from BOOTSTRAPPING to SLAVE
>>>> over the transition from MASTER to SLAVE, however Helix still issues
>>>> them in parallel (as it should).
>>>>
>>>> I was wondering what my options would be in order to keep the master up
>>>> while the future master is bootstrapping. Could
>>>> a throttling in the number of transitions be enforced at partition
>>>> level? Could I somehow specify that a state with a slave
>>>> and a bootstrapping node is undesirable?
>>>>
>>>> As a note, I have also looked at the RSync-replicateed filesystem
>>>> example. The reason for not using the OfflineOnline or the
>>>> MasterSlave model in my application is that I would like the
>>>> bootstrapping node to receive updates from clients, i.e. be visible
>>>> during the bootstrap. For this reason, I am introducing the new
>>>> BOOTSTRAPPING phase in-between OFFLINE and SLAVE.
>>>>
>>>> Regards,
>>>> Vlad
>>>>
>>>>
>>>> PS: The state model definition is as follows:
>>>>
>>>> builder.addState(MASTER, 1);
>>>>
>>>>
>>>>
>>>>             builder.addState(SLAVE, 2);
>>>>
>>>>
>>>>
>>>>             builder.addState(BOOTSTRAP, 3);
>>>>
>>>>
>>>>
>>>>             builder.addState(OFFLINE);
>>>>
>>>>
>>>>
>>>>             builder.addState(DROPPED);
>>>>
>>>>
>>>>
>>>>             // Set the initial state when the node starts
>>>>
>>>>
>>>>
>>>>             builder.initialState(OFFLINE);
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>             // Add transitions between the states.
>>>>
>>>>
>>>>
>>>>             builder.addTransition(OFFLINE, BOOTSTRAP, 4);
>>>>
>>>>
>>>>
>>>>             builder.addTransition(BOOTSTRAP, SLAVE, 5);
>>>>
>>>>
>>>>
>>>>             builder.addTransition(SLAVE, MASTER, 6);
>>>>
>>>>
>>>>
>>>>             builder.addTransition(MASTER, SLAVE, 3);
>>>>
>>>>
>>>>
>>>>             builder.addTransition(SLAVE, OFFLINE, 2);
>>>>
>>>>
>>>>
>>>>             builder.addTransition(OFFLINE, DROPPED, 1);
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>             // set constraints on states.
>>>>
>>>>
>>>>
>>>>             // static constraint
>>>>
>>>>
>>>>
>>>>             builder.upperBound(MASTER, 1);
>>>>
>>>>
>>>>
>>>>             // dynamic constraint, R means it should be derived based
>>>> on the replication
>>>>
>>>>
>>>>             // factor.
>>>>
>>>>
>>>>
>>>>             builder.dynamicUpperBound(SLAVE, "R");
>>>>
>>>
>>>
>>
>

Re: keeping the master node up during bootstrap

Posted by Vlad Balan <vl...@gmail.com>.

Hi Kishore,

I managed to implement the bootstrapping using the constraint and it appears to be running as expected. I will post my code shortly.

Regards,
Vlad

> On Apr 8, 2014, at 8:27 AM, kishore g <g....@gmail.com> wrote:
> 
> Hi Vlad,
> 
> Did you get a chance to play with the constraint.  I can write a sample code today to try this.
> 
> Thanks,
> Kishore G
> 
> 
>> On Thu, Apr 3, 2014 at 5:45 PM, vlad.gm@gmail.com <vl...@gmail.com> wrote:
>> 
>> Thank you Kanak and Kishore! I will try enforcing the per-partition constraint and let you know if somehow it does not work. I was looking at the throttling documentation, but somehow missed that a per-partition constraint was an option!
>> 
>> Regards,
>> Vlad
>> 
>> 
>>> On Thu, Apr 3, 2014 at 5:42 PM, kishore g <g....@gmail.com> wrote:
>>> Hi Vlad,
>>> 
>>> You can try setting the transition priority order and a constraint that there should be only one transition per partition across the cluster.
>>> 
>>> So the transition priority could be something like
>>> 
>>> Slave-Master
>>> Offfline -> Bootstrap
>>> Bootstrap->Slave
>>> Slave->Master 
>>> 
>>> For the rest not sure if order matters.
>>> 
>>> Also set the max transitions constraint to 1 per partition.
>>> 
>>> The reason I put Slave-Master before Offline->Bootstrap is to ensure that availability is given more importance. For example if you have 3 nodes, N1, N2, N3. N1 is Master, N2 is Slave, and N3 is down. If N1 goes down and N3 comes up at the same time. We probably dont want to wait for N3 to bootstrap before promoting N2 to Master.
>>> 
>>> I haven't tested this but assuming the constraints enforcement works, this should do the trick.
>>> 
>>> Does this make sense? Let me know if this does not work, we can add a test case.
>>> 
>>> thanks,
>>> Kishore G
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On Thu, Apr 3, 2014 at 4:57 PM, vlad.gm@gmail.com <vl...@gmail.com> wrote:
>>>> 
>>>> Dear all,
>>>> 
>>>> I am trying to construct a state model with the following transition diagram:
>>>> 
>>>> OFFLINE -> BOOTSTRAPPING <---> SLAVE <-----> MASTER
>>>>          <-----------------------------------
>>>> 
>>>> That is, an offline mode can go into a bootstraping state, from the bootstrap state it can go into a slave state,
>>>> from slave it can go from master, from master to slave and from slave it can go offline.
>>>> 
>>>> Assume that if I have a partition with two nodes pf1 and pf2 and a partition partition_0 with the following ideal state:
>>>> 
>>>> partition_0: pf2: MASTER pf1: SLAVE,
>>>> 
>>>> and that currently pf1 is serving as a master. When pf2 boots, Helix will issue, almost simultaneously, two commands:
>>>> for pf1: transition from MASTER to SLAVE
>>>> for pf2: transition from BOOTSTRAPPING to SLAVE
>>>> 
>>>> My understanding is that this happens since Helix is trying to execute as many commands in parallel and since the last state
>>>> has pf2 as master. However, the transition from BOOTSTRAPPING to SLAVE for pf2 involves a long data copy step, so
>>>> I would like to keep pf1 as a master in the meanwhile. I tried prioritizing the transition from BOOTSTRAPPING to SLAVE
>>>> over the transition from MASTER to SLAVE, however Helix still issues them in parallel (as it should).
>>>> 
>>>> I was wondering what my options would be in order to keep the master up while the future master is bootstrapping. Could
>>>> a throttling in the number of transitions be enforced at partition level? Could I somehow specify that a state with a slave
>>>> and a bootstrapping node is undesirable?
>>>> 
>>>> As a note, I have also looked at the RSync-replicateed filesystem example. The reason for not using the OfflineOnline or the
>>>> MasterSlave model in my application is that I would like the bootstrapping node to receive updates from clients, i.e. be visible
>>>> during the bootstrap. For this reason, I am introducing the new BOOTSTRAPPING phase in-between OFFLINE and SLAVE.
>>>> 
>>>> Regards,
>>>> Vlad
>>>> 
>>>> 
>>>> PS: The state model definition is as follows:
>>>> builder.addState(MASTER, 1);                                                                                                                                         
>>>> 
>>>>             builder.addState(SLAVE, 2);                                                                                                                                          
>>>> 
>>>>             builder.addState(BOOTSTRAP, 3);                                                                                                                                      
>>>> 
>>>>             builder.addState(OFFLINE);                                                                                                                                           
>>>> 
>>>>             builder.addState(DROPPED);                                                                                                                                            
>>>> 
>>>>             // Set the initial state when the node starts                                                                                                                        
>>>> 
>>>>             builder.initialState(OFFLINE);                                                                                                                                        
>>>> 
>>>>                                                                                                                                                                                   
>>>> 
>>>>             // Add transitions between the states.                                                                                                                               
>>>> 
>>>>             builder.addTransition(OFFLINE, BOOTSTRAP, 4);                                                                                                                         
>>>> 
>>>>             builder.addTransition(BOOTSTRAP, SLAVE, 5);                                                                                                                           
>>>> 
>>>>             builder.addTransition(SLAVE, MASTER, 6);                                                                                                                              
>>>> 
>>>>             builder.addTransition(MASTER, SLAVE, 3);                                                                                                                             
>>>> 
>>>>             builder.addTransition(SLAVE, OFFLINE, 2);                                                                                                                            
>>>> 
>>>>             builder.addTransition(OFFLINE, DROPPED, 1);                                                                                                                           
>>>> 
>>>>                                                                                                                                                                                   
>>>> 
>>>>             // set constraints on states.                                                                                                                                        
>>>> 
>>>>             // static constraint                                                                                                                                                 
>>>> 
>>>>             builder.upperBound(MASTER, 1);                                                                                                                                        
>>>> 
>>>>             // dynamic constraint, R means it should be derived based on the replication                                                                                          
>>>> 
>>>>             // factor.                                                                                                                                                           
>>>> 
>>>>             builder.dynamicUpperBound(SLAVE, "R");                     
>>>> 
>

Re: keeping the master node up during bootstrap

Posted by kishore g <g....@gmail.com>.

Hi Vlad,

Did you get a chance to play with the constraint.  I can write a sample
code today to try this.

Thanks,
Kishore G


On Thu, Apr 3, 2014 at 5:45 PM, vlad.gm@gmail.com <vl...@gmail.com> wrote:

>
> Thank you Kanak and Kishore! I will try enforcing the per-partition
> constraint and let you know if somehow it does not work. I was looking at
> the throttling documentation, but somehow missed that a per-partition
> constraint was an option!
>
> Regards,
> Vlad
>
>
> On Thu, Apr 3, 2014 at 5:42 PM, kishore g <g....@gmail.com> wrote:
>
>> Hi Vlad,
>>
>> You can try setting the transition priority order and a constraint that
>> there should be only one transition per partition across the cluster.
>>
>> So the transition priority could be something like
>>
>> Slave-Master
>> Offfline -> Bootstrap
>> Bootstrap->Slave
>> Slave->Master
>>
>> For the rest not sure if order matters.
>>
>> Also set the max transitions constraint to 1 per partition.
>>
>> The reason I put Slave-Master before Offline->Bootstrap is to ensure that
>> availability is given more importance. For example if you have 3 nodes, N1,
>> N2, N3. N1 is Master, N2 is Slave, and N3 is down. If N1 goes down and N3
>> comes up at the same time. We probably dont want to wait for N3 to
>> bootstrap before promoting N2 to Master.
>>
>> I haven't tested this but assuming the constraints enforcement works,
>> this should do the trick.
>>
>> Does this make sense? Let me know if this does not work, we can add a
>> test case.
>>
>> thanks,
>> Kishore G
>>
>>
>>
>>
>>
>>
>> On Thu, Apr 3, 2014 at 4:57 PM, vlad.gm@gmail.com <vl...@gmail.com>wrote:
>>
>>>
>>> Dear all,
>>>
>>> I am trying to construct a state model with the following transition
>>> diagram:
>>>
>>> OFFLINE -> BOOTSTRAPPING <---> SLAVE <-----> MASTER
>>>          <-----------------------------------
>>>
>>> That is, an offline mode can go into a bootstraping state, from the
>>> bootstrap state it can go into a slave state,
>>> from slave it can go from master, from master to slave and from slave it
>>> can go offline.
>>>
>>> Assume that if I have a partition with two nodes pf1 and pf2 and a
>>> partition partition_0 with the following ideal state:
>>>
>>> partition_0: pf2: MASTER pf1: SLAVE,
>>>
>>> and that currently pf1 is serving as a master. When pf2 boots, Helix
>>> will issue, almost simultaneously, two commands:
>>> for pf1: transition from MASTER to SLAVE
>>> for pf2: transition from BOOTSTRAPPING to SLAVE
>>>
>>> My understanding is that this happens since Helix is trying to execute
>>> as many commands in parallel and since the last state
>>> has pf2 as master. However, the transition from BOOTSTRAPPING to SLAVE
>>> for pf2 involves a long data copy step, so
>>> I would like to keep pf1 as a master in the meanwhile. I tried
>>> prioritizing the transition from BOOTSTRAPPING to SLAVE
>>> over the transition from MASTER to SLAVE, however Helix still issues
>>> them in parallel (as it should).
>>>
>>> I was wondering what my options would be in order to keep the master up
>>> while the future master is bootstrapping. Could
>>> a throttling in the number of transitions be enforced at partition
>>> level? Could I somehow specify that a state with a slave
>>> and a bootstrapping node is undesirable?
>>>
>>> As a note, I have also looked at the RSync-replicateed filesystem
>>> example. The reason for not using the OfflineOnline or the
>>> MasterSlave model in my application is that I would like the
>>> bootstrapping node to receive updates from clients, i.e. be visible
>>> during the bootstrap. For this reason, I am introducing the new
>>> BOOTSTRAPPING phase in-between OFFLINE and SLAVE.
>>>
>>> Regards,
>>> Vlad
>>>
>>>
>>> PS: The state model definition is as follows:
>>>
>>> builder.addState(MASTER, 1);
>>>
>>>
>>>
>>>             builder.addState(SLAVE, 2);
>>>
>>>
>>>
>>>             builder.addState(BOOTSTRAP, 3);
>>>
>>>
>>>
>>>             builder.addState(OFFLINE);
>>>
>>>
>>>
>>>             builder.addState(DROPPED);
>>>
>>>
>>>
>>>             // Set the initial state when the node starts
>>>
>>>
>>>
>>>             builder.initialState(OFFLINE);
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>             // Add transitions between the states.
>>>
>>>
>>>
>>>             builder.addTransition(OFFLINE, BOOTSTRAP, 4);
>>>
>>>
>>>
>>>             builder.addTransition(BOOTSTRAP, SLAVE, 5);
>>>
>>>
>>>
>>>             builder.addTransition(SLAVE, MASTER, 6);
>>>
>>>
>>>
>>>             builder.addTransition(MASTER, SLAVE, 3);
>>>
>>>
>>>
>>>             builder.addTransition(SLAVE, OFFLINE, 2);
>>>
>>>
>>>
>>>             builder.addTransition(OFFLINE, DROPPED, 1);
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>             // set constraints on states.
>>>
>>>
>>>
>>>             // static constraint
>>>
>>>
>>>
>>>             builder.upperBound(MASTER, 1);
>>>
>>>
>>>
>>>             // dynamic constraint, R means it should be derived based
>>> on the replication
>>>
>>>
>>>             // factor.
>>>
>>>
>>>
>>>             builder.dynamicUpperBound(SLAVE, "R");
>>>
>>
>>
>

Re: keeping the master node up during bootstrap

Posted by "vlad.gm@gmail.com" <vl...@gmail.com>.

Thank you Kanak and Kishore! I will try enforcing the per-partition
constraint and let you know if somehow it does not work. I was looking at
the throttling documentation, but somehow missed that a per-partition
constraint was an option!

Regards,
Vlad


On Thu, Apr 3, 2014 at 5:42 PM, kishore g <g....@gmail.com> wrote:

> Hi Vlad,
>
> You can try setting the transition priority order and a constraint that
> there should be only one transition per partition across the cluster.
>
> So the transition priority could be something like
>
> Slave-Master
> Offfline -> Bootstrap
> Bootstrap->Slave
> Slave->Master
>
> For the rest not sure if order matters.
>
> Also set the max transitions constraint to 1 per partition.
>
> The reason I put Slave-Master before Offline->Bootstrap is to ensure that
> availability is given more importance. For example if you have 3 nodes, N1,
> N2, N3. N1 is Master, N2 is Slave, and N3 is down. If N1 goes down and N3
> comes up at the same time. We probably dont want to wait for N3 to
> bootstrap before promoting N2 to Master.
>
> I haven't tested this but assuming the constraints enforcement works, this
> should do the trick.
>
> Does this make sense? Let me know if this does not work, we can add a test
> case.
>
> thanks,
> Kishore G
>
>
>
>
>
>
> On Thu, Apr 3, 2014 at 4:57 PM, vlad.gm@gmail.com <vl...@gmail.com>wrote:
>
>>
>> Dear all,
>>
>> I am trying to construct a state model with the following transition
>> diagram:
>>
>> OFFLINE -> BOOTSTRAPPING <---> SLAVE <-----> MASTER
>>          <-----------------------------------
>>
>> That is, an offline mode can go into a bootstraping state, from the
>> bootstrap state it can go into a slave state,
>> from slave it can go from master, from master to slave and from slave it
>> can go offline.
>>
>> Assume that if I have a partition with two nodes pf1 and pf2 and a
>> partition partition_0 with the following ideal state:
>>
>> partition_0: pf2: MASTER pf1: SLAVE,
>>
>> and that currently pf1 is serving as a master. When pf2 boots, Helix will
>> issue, almost simultaneously, two commands:
>> for pf1: transition from MASTER to SLAVE
>> for pf2: transition from BOOTSTRAPPING to SLAVE
>>
>> My understanding is that this happens since Helix is trying to execute as
>> many commands in parallel and since the last state
>> has pf2 as master. However, the transition from BOOTSTRAPPING to SLAVE
>> for pf2 involves a long data copy step, so
>> I would like to keep pf1 as a master in the meanwhile. I tried
>> prioritizing the transition from BOOTSTRAPPING to SLAVE
>> over the transition from MASTER to SLAVE, however Helix still issues them
>> in parallel (as it should).
>>
>> I was wondering what my options would be in order to keep the master up
>> while the future master is bootstrapping. Could
>> a throttling in the number of transitions be enforced at partition level?
>> Could I somehow specify that a state with a slave
>> and a bootstrapping node is undesirable?
>>
>> As a note, I have also looked at the RSync-replicateed filesystem
>> example. The reason for not using the OfflineOnline or the
>> MasterSlave model in my application is that I would like the
>> bootstrapping node to receive updates from clients, i.e. be visible
>> during the bootstrap. For this reason, I am introducing the new
>> BOOTSTRAPPING phase in-between OFFLINE and SLAVE.
>>
>> Regards,
>> Vlad
>>
>>
>> PS: The state model definition is as follows:
>>
>> builder.addState(MASTER, 1);
>>
>>
>>
>>             builder.addState(SLAVE, 2);
>>
>>
>>
>>             builder.addState(BOOTSTRAP, 3);
>>
>>
>>
>>             builder.addState(OFFLINE);
>>
>>
>>
>>             builder.addState(DROPPED);
>>
>>
>>
>>             // Set the initial state when the node starts
>>
>>
>>
>>             builder.initialState(OFFLINE);
>>
>>
>>
>>
>>
>>
>>
>>             // Add transitions between the states.
>>
>>
>>
>>             builder.addTransition(OFFLINE, BOOTSTRAP, 4);
>>
>>
>>
>>             builder.addTransition(BOOTSTRAP, SLAVE, 5);
>>
>>
>>
>>             builder.addTransition(SLAVE, MASTER, 6);
>>
>>
>>
>>             builder.addTransition(MASTER, SLAVE, 3);
>>
>>
>>
>>             builder.addTransition(SLAVE, OFFLINE, 2);
>>
>>
>>
>>             builder.addTransition(OFFLINE, DROPPED, 1);
>>
>>
>>
>>
>>
>>
>>
>>             // set constraints on states.
>>
>>
>>
>>             // static constraint
>>
>>
>>
>>             builder.upperBound(MASTER, 1);
>>
>>
>>
>>             // dynamic constraint, R means it should be derived based on
>> the replication
>>
>>
>>             // factor.
>>
>>
>>
>>             builder.dynamicUpperBound(SLAVE, "R");
>>
>
>

Re: keeping the master node up during bootstrap

Posted by kishore g <g....@gmail.com>.

Hi Vlad,

You can try setting the transition priority order and a constraint that
there should be only one transition per partition across the cluster.

So the transition priority could be something like

Slave-Master
Offfline -> Bootstrap
Bootstrap->Slave
Slave->Master

For the rest not sure if order matters.

Also set the max transitions constraint to 1 per partition.

The reason I put Slave-Master before Offline->Bootstrap is to ensure that
availability is given more importance. For example if you have 3 nodes, N1,
N2, N3. N1 is Master, N2 is Slave, and N3 is down. If N1 goes down and N3
comes up at the same time. We probably dont want to wait for N3 to
bootstrap before promoting N2 to Master.

I haven't tested this but assuming the constraints enforcement works, this
should do the trick.

Does this make sense? Let me know if this does not work, we can add a test
case.

thanks,
Kishore G






On Thu, Apr 3, 2014 at 4:57 PM, vlad.gm@gmail.com <vl...@gmail.com> wrote:

>
> Dear all,
>
> I am trying to construct a state model with the following transition
> diagram:
>
> OFFLINE -> BOOTSTRAPPING <---> SLAVE <-----> MASTER
>          <-----------------------------------
>
> That is, an offline mode can go into a bootstraping state, from the
> bootstrap state it can go into a slave state,
> from slave it can go from master, from master to slave and from slave it
> can go offline.
>
> Assume that if I have a partition with two nodes pf1 and pf2 and a
> partition partition_0 with the following ideal state:
>
> partition_0: pf2: MASTER pf1: SLAVE,
>
> and that currently pf1 is serving as a master. When pf2 boots, Helix will
> issue, almost simultaneously, two commands:
> for pf1: transition from MASTER to SLAVE
> for pf2: transition from BOOTSTRAPPING to SLAVE
>
> My understanding is that this happens since Helix is trying to execute as
> many commands in parallel and since the last state
> has pf2 as master. However, the transition from BOOTSTRAPPING to SLAVE for
> pf2 involves a long data copy step, so
> I would like to keep pf1 as a master in the meanwhile. I tried
> prioritizing the transition from BOOTSTRAPPING to SLAVE
> over the transition from MASTER to SLAVE, however Helix still issues them
> in parallel (as it should).
>
> I was wondering what my options would be in order to keep the master up
> while the future master is bootstrapping. Could
> a throttling in the number of transitions be enforced at partition level?
> Could I somehow specify that a state with a slave
> and a bootstrapping node is undesirable?
>
> As a note, I have also looked at the RSync-replicateed filesystem example.
> The reason for not using the OfflineOnline or the
> MasterSlave model in my application is that I would like the bootstrapping
> node to receive updates from clients, i.e. be visible
> during the bootstrap. For this reason, I am introducing the new
> BOOTSTRAPPING phase in-between OFFLINE and SLAVE.
>
> Regards,
> Vlad
>
>
> PS: The state model definition is as follows:
>
> builder.addState(MASTER, 1);
>
>
>
>             builder.addState(SLAVE, 2);
>
>
>
>             builder.addState(BOOTSTRAP, 3);
>
>
>
>             builder.addState(OFFLINE);
>
>
>
>             builder.addState(DROPPED);
>
>
>
>             // Set the initial state when the node starts
>
>
>
>             builder.initialState(OFFLINE);
>
>
>
>
>
>
>
>             // Add transitions between the states.
>
>
>
>             builder.addTransition(OFFLINE, BOOTSTRAP, 4);
>
>
>
>             builder.addTransition(BOOTSTRAP, SLAVE, 5);
>
>
>
>             builder.addTransition(SLAVE, MASTER, 6);
>
>
>
>             builder.addTransition(MASTER, SLAVE, 3);
>
>
>
>             builder.addTransition(SLAVE, OFFLINE, 2);
>
>
>
>             builder.addTransition(OFFLINE, DROPPED, 1);
>
>
>
>
>
>
>
>             // set constraints on states.
>
>
>
>             // static constraint
>
>
>
>             builder.upperBound(MASTER, 1);
>
>
>
>             // dynamic constraint, R means it should be derived based on
> the replication
>
>
>             // factor.
>
>
>
>             builder.dynamicUpperBound(SLAVE, "R");
>

Re: keeping the master node up during bootstrap

Posted by "vlad.gm@gmail.com" <vl...@gmail.com>.

Hi Kanak,

I am using the SEMI_AUTO rebalancing mode. The code for setting up the
cluster is a bit of a handful, but I have copied it below (it is a command
line utility similar to helix_admin.sh).

My sequence of operations is as follows
setup_cluster
setup_model
setup_resource
add_node for all the nodes
add_nodes_to_resource (if there are no nodes, this calls rebalance,
otherwise it is an expansion, as can be seen in the corresponding function).

Regards,
Vlad

//// Some lines are missing, references to proprietary libraries and imports


/* Setup a Helix cluster for the KVStore */
public static void setupCluster() {
assert(cluster != null);
clusterSetup.addCluster(cluster, true);
}

/* Register the KVStore's state model for the KVStore cluster */
public static void setupStateModel() {
assert(cluster != null);
StateModelDefinition stateModel = KVHelixDefinitions.
defineStateModel();
clusterSetup.addStateModelDef(cluster,
KVHelixDefinitions.STATE_MODEL_NAME.stringify(),
stateModel);
}

/* Setup a new resource for a particular namespace */
private static void setupResource(int numPartitions) {
assert(cluster != null);
assert(resource != null);
assert(numPartitions > 0);

clusterSetup.addResourceToCluster(cluster,
resource,
numPartitions,
KVHelixDefinitions.STATE_MODEL_NAME.toString(),
RebalanceMode.SEMI_AUTO.toString());
}

/**
 *  Add a node to the cluster
 *  @param node the hostname of the new node
 *  @param sid  the ServerId of the new node
 *  */
private static void addNode(String node, String sid) {
assert(cluster != null);
assert(node != null);

clusterSetup.addInstanceToCluster(cluster,
KVHelixDefinitions.nodeToInstance(node));
clusterSetup.setConfig(ConfigScopeProperty.PARTICIPANT,
KVHelixDefinitions.clusterName()+','+
KVHelixDefinitions.nodeToInstance(node),
"ServerId="+sid);
}

/**
 * Replace a node with another in the cluster
 * @param nodes an array of node hostnames. We will be replacing
 * nodes[0] with nodes[1]
 */
private static void replaceNode(String[] nodes) {
assert(cluster != null);
assert(nodes != null);
assert(nodes.length >= 2);

admin.enableInstance(KVHelixDefinitions.clusterName(),
KVHelixDefinitions.nodeToInstance(nodes[0]), false);

clusterSetup.swapInstance(KVHelixDefinitions.clusterName(),
KVHelixDefinitions.nodeToInstance(nodes[0]),
KVHelixDefinitions.nodeToInstance(nodes[1]));
}

/**
 * Expand a resource over a few new nodes, moving as few
 * partitions as possible in order to rebalance the partitions
 * over the cluster.
 * @param nodes a list of new node hostnames
 */
private static void addNodesToResource(String[] nodes) {
assert(cluster != null);
assert(nodes != null);
assert(resource != null);

IdealState currentState = admin.getResourceIdealState(cluster, resource);

List<String> newNodes = new ArrayList<String>();
for (String node : nodes) {
admin.addInstanceTag(cluster,
KVHelixDefinitions.nodeToInstance(node),
resource);
newNodes.add(KVHelixDefinitions.nodeToInstance(node));
}

if (currentState.getRecord().getListFields().size() == 0) {
clusterSetup.rebalanceResource(cluster,
resource,
KVHelixDefinitions.NUM_REPLICAS);
} else {
clusterSetup.expandResource(cluster, resource);
}
}





On Thu, Apr 3, 2014 at 5:20 PM, Kanak Biscuitwala <ka...@hotmail.com>wrote:

> Hi Vlad,
>
> What rebalance mode are you using (FULL_AUTO, SEMI_AUTO, CUSTOMIZED, or
> USER_DEFINED)? Can you also paste the code you're using to set up your
> state model, if possible?
>
> Thanks,
> Kanak
>
> ________________________________
> > Date: Thu, 3 Apr 2014 16:57:59 -0700
> > Subject: keeping the master node up during bootstrap
> > From: vlad.gm@gmail.com
> > To: user@helix.apache.org
> >
> >
> > Dear all,
> >
> > I am trying to construct a state model with the following transition
> diagram:
> >
> > OFFLINE -> BOOTSTRAPPING <---> SLAVE <-----> MASTER
> > а а а а а<-----------------------------------
> >
> > That is, an offline mode can go into a bootstraping state, from the
> > bootstrap state it can go into a slave state,
> > from slave it can go from master, from master to slave and from slave
> > it can go offline.
> >
> > Assume that if I have a partition with two nodes pf1 and pf2 and a
> > partition partition_0 with the following ideal state:
> >
> > partition_0: pf2: MASTER pf1: SLAVE,
> >
> > and that currently pf1 is serving as a master. When pf2 boots, Helix
> > will issue, almost simultaneously, two commands:
> > for pf1: transition from MASTER to SLAVE
> > for pf2: transition from BOOTSTRAPPING to SLAVE
> >
> > My understanding is that this happens since Helix is trying to execute
> > as many commands in parallel and since the last state
> > has pf2 as master. However, the transition from BOOTSTRAPPING to SLAVE
> > for pf2 involves a long data copy step, so
> > I would like to keep pf1 as a master in the meanwhile. I tried
> > prioritizing the transition from BOOTSTRAPPING to SLAVE
> > over the transition from MASTER to SLAVE, however Helix still issues
> > them in parallel (as it should).
> >
> > I was wondering what my options would be in order to keep the master up
> > while the future master is bootstrapping. Could
> > a throttling in the number of transitions be enforced at partition
> > level? Could I somehow specify that a state with a slave
> > and a bootstrapping node is undesirable?
> >
> > As a note, I have also looked at the RSync-replicateed filesystem
> > example. The reason for not using the OfflineOnline or the
> > MasterSlave model in my application is that I would like the
> > bootstrapping node to receive updates from clients, i.e. be visible
> > during the bootstrap. For this reason, I am introducing the new
> > BOOTSTRAPPING phase in-between OFFLINE and SLAVE.
> >
> > Regards,
> > Vlad
> >
> >
> > PS: The state model definition is as follows:
> >
> > builder.addState(MASTER, 1); а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а
> >
> > а а а а а а builder.addState(SLAVE, 2);а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а builder.addState(BOOTSTRAP, 3);а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а builder.addState(OFFLINE); а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а builder.addState(DROPPED); а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а // Set the initial state when the node startsа а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а builder.initialState(OFFLINE); а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > аа а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а // Add transitions between the states. а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а builder.addTransition(OFFLINE, BOOTSTRAP, 4);а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а builder.addTransition(BOOTSTRAP, SLAVE, 5);а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а builder.addTransition(SLAVE, MASTER, 6); а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а builder.addTransition(MASTER, SLAVE, 3); а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а builder.addTransition(SLAVE, OFFLINE, 2);а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а builder.addTransition(OFFLINE, DROPPED, 1);а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > аа а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а // set constraints on states.а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а // static constraint а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а builder.upperBound(MASTER, 1); а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а // dynamic constraint, R means it should be derived based
> > on the replication а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а // factor. а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а
> > а а а а а а а а а а а а а а а а а а
> >
> > а а а а а а builder.dynamicUpperBound(SLAVE, "R");а а а а а а а а а а а
>

RE: keeping the master node up during bootstrap

Posted by Kanak Biscuitwala <ka...@hotmail.com>.

Hi Vlad,

What rebalance mode are you using (FULL_AUTO, SEMI_AUTO, CUSTOMIZED, or USER_DEFINED)? Can you also paste the code you're using to set up your state model, if possible?

Thanks,
Kanak

________________________________
> Date: Thu, 3 Apr 2014 16:57:59 -0700 
> Subject: keeping the master node up during bootstrap 
> From: vlad.gm@gmail.com 
> To: user@helix.apache.org 
> 
> 
> Dear all, 
> 
> I am trying to construct a state model with the following transition diagram: 
> 
> OFFLINE -> BOOTSTRAPPING <---> SLAVE <-----> MASTER 
> а а а а а<----------------------------------- 
> 
> That is, an offline mode can go into a bootstraping state, from the 
> bootstrap state it can go into a slave state, 
> from slave it can go from master, from master to slave and from slave 
> it can go offline. 
> 
> Assume that if I have a partition with two nodes pf1 and pf2 and a 
> partition partition_0 with the following ideal state: 
> 
> partition_0: pf2: MASTER pf1: SLAVE, 
> 
> and that currently pf1 is serving as a master. When pf2 boots, Helix 
> will issue, almost simultaneously, two commands: 
> for pf1: transition from MASTER to SLAVE 
> for pf2: transition from BOOTSTRAPPING to SLAVE 
> 
> My understanding is that this happens since Helix is trying to execute 
> as many commands in parallel and since the last state 
> has pf2 as master. However, the transition from BOOTSTRAPPING to SLAVE 
> for pf2 involves a long data copy step, so 
> I would like to keep pf1 as a master in the meanwhile. I tried 
> prioritizing the transition from BOOTSTRAPPING to SLAVE 
> over the transition from MASTER to SLAVE, however Helix still issues 
> them in parallel (as it should). 
> 
> I was wondering what my options would be in order to keep the master up 
> while the future master is bootstrapping. Could 
> a throttling in the number of transitions be enforced at partition 
> level? Could I somehow specify that a state with a slave 
> and a bootstrapping node is undesirable? 
> 
> As a note, I have also looked at the RSync-replicateed filesystem 
> example. The reason for not using the OfflineOnline or the 
> MasterSlave model in my application is that I would like the 
> bootstrapping node to receive updates from clients, i.e. be visible 
> during the bootstrap. For this reason, I am introducing the new 
> BOOTSTRAPPING phase in-between OFFLINE and SLAVE. 
> 
> Regards, 
> Vlad 
> 
> 
> PS: The state model definition is as follows: 
> 
> builder.addState(MASTER, 1); а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а 
> 
> а а а а а а builder.addState(SLAVE, 2);а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а builder.addState(BOOTSTRAP, 3);а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а builder.addState(OFFLINE); а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а builder.addState(DROPPED); а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а // Set the initial state when the node startsа а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а builder.initialState(OFFLINE); а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> аа а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а // Add transitions between the states. а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а builder.addTransition(OFFLINE, BOOTSTRAP, 4);а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а builder.addTransition(BOOTSTRAP, SLAVE, 5);а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а builder.addTransition(SLAVE, MASTER, 6); а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а builder.addTransition(MASTER, SLAVE, 3); а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а builder.addTransition(SLAVE, OFFLINE, 2);а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а builder.addTransition(OFFLINE, DROPPED, 1);а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> аа а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а // set constraints on states.а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а // static constraint а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а builder.upperBound(MASTER, 1); а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а // dynamic constraint, R means it should be derived based 
> on the replication а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а // factor. а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а а 
> а а а а а а а а а а а а а а а а а а 
> 
> а а а а а а builder.dynamicUpperBound(SLAVE, "R");а а а а а а а а а а а