You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Shailesh Ligade <SL...@FBI.GOV> on 2021/08/18 11:53:00 UTC

RE: [EXTERNAL EMAIL] - Re: [External] RE: how to decommission tablet server

Thanks

So in the reality, if I issue admin stop on the  tserver, (with -f if needed) I don’t need to stop linux service, right?

Also, when it is safe to update slaves file? Can I wait till I decommission all my nodes? Or I need to do that after I one node is decommissioned?

Appreciated

-S


From: Mike Miller <mm...@apache.org>
Sent: Wednesday, August 18, 2021 7:47 AM
To: user@accumulo.apache.org
Subject: [EXTERNAL EMAIL] - Re: [External] RE: how to decommission tablet server

The admin stop command issues a graceful shutdown to Accumulo for that tserver. There is a force option you could try {"-f", "--force"} that will remove the lock. But these are more graceful than a linux kill -9 command, which you may have to do if the admin command doesn't kill the process entirely.

On Wed, Aug 18, 2021 at 7:31 AM Ligade, Shailesh [USA] <Li...@bah.com>> wrote:
Thank you for good explanation! I really appreciate that.

Yes I need to remove the hardware, meaning I need to stop everything on the server (tserver and datanode)

One quick question:

What is the difference between accumulo admin stop <tserver>:9997 and stopping tserver linux service?

When I issue admin stop, I can see, from the monitor, hosted tablets count from the tserver in the question  goes down to 0, however it doesn't stop the tserver process or service.

In your steps, you are stopping datanode service first (adding into exclude file and then running refreshNodes and then stop the service), I was thinking to stop accumulo tserver and let it handle hosted tablets first, before touching datanode, will there be any difference? Just trying to understand how the relationship between accumulo and hadoop is.

Thank you!

-S
________________________________
From: dev1@etcoleman.com<ma...@etcoleman.com> <de...@etcoleman.com>>
Sent: Tuesday, August 17, 2021 2:39 PM
To: user@accumulo.apache.org<ma...@accumulo.apache.org> <us...@accumulo.apache.org>>
Subject: [External] RE: how to decommission tablet server


Maybe you could clarify.  Decommissioning tablet servers and hdfs replication are separate and distinct issues.  Accumulo will generally be unaware of hdfs replication and table assignment does not change the hdfs replication.  You can set the replication factor for a tablet – but that is used on writes to hdfs – Accumulo will assume that on any successful write, on return hdfs  is managing the details.



When a tablet is assigned / migrated, the underlying files in hdfs are not changed – the file references are reassigned in a metadata operation, but the files themselves are not modified.  They will maintain whatever replication factor that was assigned and whatever the namenode decides.



If you are removing servers that have both data nodes and tserver processes running:



If you stop / kill the tserver, the tablets assigned to that server will be reassigned rather quickly.  It is only an metadata update.  The exact timing will depend on your ZooKeeper time-out setting, but the “dead” tserver should be detected and reassigned in short order. The reassignment may cause some churn of assignments if the cluster becomes un-balanced.   The manager (master) will select tablets from tservers that are over-subscribed and then assign them to tservers that have fewer tablets – you can monitor the manager (master) debug log to see the migration progress.  If you want to be gentile, stop a tserver, wait for the number of unassigned tables to hit zero and migration to settle and then repeat.



If you want to stop the data nodes, you can do that independently of Accumulo – just follow the Hadoop data node decommission process.  Hadoop will move the data blocks assigned to the data node so that it is “safe” to then stop the data node process.  This is independent of Accumulo and Accumulo will not be aware that the blocks are moving.  If you are running compactions, Accumulo may try to write blocks locally, but if the data node is rejecting new block assignments (which I rather assume that it would when in decommission mode) then Accumulo still would not care.  If somehow new blocks where written it may just delay the Hadoop data node decommissioning.



If you are running ingest while killing tservers – things should mostly work – there may be ingest failures, but normally things would get retried and the subsequent effort should succeed – the issue may be that if by bad luck the work keeps getting assigned to tservers that are then killed, you could end up exceeding the number of retries and the ingest would fail out right.  If you can pause ingest, then this limits that chance.  If you can monitor your ingest and know when an ingest failed you could just reschedule the ingest (for bulk import)  If you are doing continuous ingest, it may be harder to determine if a specific ingest fails, so you may need to select an appropriate range for replay.  Overall it may mostly work – it will depend on your processes and your tolerance for any particular data loss on an ingest.



The modest approach (if you can accept transient errors):



1 Start the data node decommission process.

2 Pause ingest and cancel any running user compactions.

3 Stop a tserver and wait for unassigned tablets to go back to 0.  Wait for the tablet migration (if any) to quiet down.

4 Repeat 3 until all tserver processes have been stopped on the nodes you are removing.

5 Restart ingest – rerun any user compactions if you stopped any.

6 Wait for the hdfs decommission process to finish moving / replicating blocks.

7 stop the data node process.

8 do what you want with the node.



You do not need to schedule down time – if you can accept transient errors – say that a client scan is running and that tserver is stopped – the client may receive an error for the scan.  If the scan is resubmitted and the tablet has been reassigned it should work – it may pause for the reassignment and / or timeout if the assignment takes some time.   You are basically playing a number game here – the number of tablets, the number of unassigned tablets, the odds that a scan would be using a particular tablet for the duration that it is unavailable.  It’s not guaranteed that it will fail, its just that there is a greater than 0 chance that it could – if that is unacceptable then:



1 Stop ingest – wait for all to finish or mark which ones will need to be rescheduled

2 Stop Accumulo

3 Remove the tservers from the servers list

4 Start Accumulo without starting the decommissioned tserver nodes.



Do what you want with the data node decommissioning.



The later approach removes possible transient issues.  It is up to you to determine your tolerance for possible transient issues for the duration that tservers are being stopped vs a complete outage for the duration that Accumulo is down.  If it is a large cluster and just a few tservers, the odds of a specific tablet being off line for a short duration may be very low.  If it is a small cluster or the percentage of tservers that you are stopping is large then the odds increase, but the issues will still be transient.  You need to decide which is acceptable to you and your circumstances.



From: Shailesh Ligade <SL...@FBI.GOV>>
Sent: Tuesday, August 17, 2021 11:26 AM
To: user@accumulo.apache.org<ma...@accumulo.apache.org>
Subject: RE: how to decommission tablet server



It will be helpful to know that when you are decommissioning tablets (one at a time for underlying hdfs to replicate), do we need accumulo downtime? Can accumulo be ingesting while we are decommissioning tablets?



Thanks



-S



From: Shailesh Ligade <SL...@FBI.GOV>>
Sent: Tuesday, August 17, 2021 8:52 AM
To: user@accumulo.apache.org<ma...@accumulo.apache.org>
Subject: [EXTERNAL EMAIL] - how to decommission tablet server



Hello,



I am using accumulo 1.10 and want to remove few tablet server



I saw in the documentation that I need to run



accumulo admin stop <tserver>:9997



That command comes back quickly, not sure how long, if any I have to wait for before I stop tserver service? When is the time to stop datanode service (running on the same tablet server)? And when to update slaves files (for accumulo and hdfs)?



Any guidelines on this?



Thanks



-S







RE: [EXTERNAL EMAIL] - Re: [External] RE: how to decommission tablet server

Posted by de...@etcoleman.com.
If the admin stop fails to stop the service you will need to either kill or service stop the linux process.

 

The hosts file should be able to be modified either before or after.  If you do it before remember that things like admin start-all, stop-all  will not know about those nodes.  Likewise, if after, then those commands may try actions like start that you’d rather not happen.

 

From: Shailesh Ligade <SL...@FBI.GOV> 
Sent: Wednesday, August 18, 2021 7:53 AM
To: user@accumulo.apache.org
Subject: RE: [EXTERNAL EMAIL] - Re: [External] RE: how to decommission tablet server

 

Thanks

 

So in the reality, if I issue admin stop on the  tserver, (with -f if needed) I don’t need to stop linux service, right?

 

Also, when it is safe to update slaves file? Can I wait till I decommission all my nodes? Or I need to do that after I one node is decommissioned?

 

Appreciated

 

-S

 

 

From: Mike Miller <mmiller@apache.org <ma...@apache.org> > 
Sent: Wednesday, August 18, 2021 7:47 AM
To: user@accumulo.apache.org <ma...@accumulo.apache.org> 
Subject: [EXTERNAL EMAIL] - Re: [External] RE: how to decommission tablet server

 

The admin stop command issues a graceful shutdown to Accumulo for that tserver. There is a force option you could try {"-f", "--force"} that will remove the lock. But these are more graceful than a linux kill -9 command, which you may have to do if the admin command doesn't kill the process entirely.

 

On Wed, Aug 18, 2021 at 7:31 AM Ligade, Shailesh [USA] <Ligade_Shailesh@bah.com <ma...@bah.com> > wrote:

Thank you for good explanation! I really appreciate that.

 

Yes I need to remove the hardware, meaning I need to stop everything on the server (tserver and datanode)

 

One quick question:

 

What is the difference between accumulo admin stop <tserver>:9997 and stopping tserver linux service?

 

When I issue admin stop, I can see, from the monitor, hosted tablets count from the tserver in the question  goes down to 0, however it doesn't stop the tserver process or service.

 

In your steps, you are stopping datanode service first (adding into exclude file and then running refreshNodes and then stop the service), I was thinking to stop accumulo tserver and let it handle hosted tablets first, before touching datanode, will there be any difference? Just trying to understand how the relationship between accumulo and hadoop is.

 

Thank you!

 

-S

  _____  

From: dev1@etcoleman.com <ma...@etcoleman.com>  <dev1@etcoleman.com <ma...@etcoleman.com> >
Sent: Tuesday, August 17, 2021 2:39 PM
To: user@accumulo.apache.org <ma...@accumulo.apache.org>  <user@accumulo.apache.org <ma...@accumulo.apache.org> >
Subject: [External] RE: how to decommission tablet server 

 

Maybe you could clarify.  Decommissioning tablet servers and hdfs replication are separate and distinct issues.  Accumulo will generally be unaware of hdfs replication and table assignment does not change the hdfs replication.  You can set the replication factor for a tablet – but that is used on writes to hdfs – Accumulo will assume that on any successful write, on return hdfs  is managing the details.

 

When a tablet is assigned / migrated, the underlying files in hdfs are not changed – the file references are reassigned in a metadata operation, but the files themselves are not modified.  They will maintain whatever replication factor that was assigned and whatever the namenode decides.

 

If you are removing servers that have both data nodes and tserver processes running: 

 

If you stop / kill the tserver, the tablets assigned to that server will be reassigned rather quickly.  It is only an metadata update.  The exact timing will depend on your ZooKeeper time-out setting, but the “dead” tserver should be detected and reassigned in short order. The reassignment may cause some churn of assignments if the cluster becomes un-balanced.   The manager (master) will select tablets from tservers that are over-subscribed and then assign them to tservers that have fewer tablets – you can monitor the manager (master) debug log to see the migration progress.  If you want to be gentile, stop a tserver, wait for the number of unassigned tables to hit zero and migration to settle and then repeat.

 

If you want to stop the data nodes, you can do that independently of Accumulo – just follow the Hadoop data node decommission process.  Hadoop will move the data blocks assigned to the data node so that it is “safe” to then stop the data node process.  This is independent of Accumulo and Accumulo will not be aware that the blocks are moving.  If you are running compactions, Accumulo may try to write blocks locally, but if the data node is rejecting new block assignments (which I rather assume that it would when in decommission mode) then Accumulo still would not care.  If somehow new blocks where written it may just delay the Hadoop data node decommissioning.

 

If you are running ingest while killing tservers – things should mostly work – there may be ingest failures, but normally things would get retried and the subsequent effort should succeed – the issue may be that if by bad luck the work keeps getting assigned to tservers that are then killed, you could end up exceeding the number of retries and the ingest would fail out right.  If you can pause ingest, then this limits that chance.  If you can monitor your ingest and know when an ingest failed you could just reschedule the ingest (for bulk import)  If you are doing continuous ingest, it may be harder to determine if a specific ingest fails, so you may need to select an appropriate range for replay.  Overall it may mostly work – it will depend on your processes and your tolerance for any particular data loss on an ingest.

 

The modest approach (if you can accept transient errors):

 

1 Start the data node decommission process.

2 Pause ingest and cancel any running user compactions.

3 Stop a tserver and wait for unassigned tablets to go back to 0.  Wait for the tablet migration (if any) to quiet down. 

4 Repeat 3 until all tserver processes have been stopped on the nodes you are removing.

5 Restart ingest – rerun any user compactions if you stopped any.

6 Wait for the hdfs decommission process to finish moving / replicating blocks.

7 stop the data node process.

8 do what you want with the node.

 

You do not need to schedule down time – if you can accept transient errors – say that a client scan is running and that tserver is stopped – the client may receive an error for the scan.  If the scan is resubmitted and the tablet has been reassigned it should work – it may pause for the reassignment and / or timeout if the assignment takes some time.   You are basically playing a number game here – the number of tablets, the number of unassigned tablets, the odds that a scan would be using a particular tablet for the duration that it is unavailable.  It’s not guaranteed that it will fail, its just that there is a greater than 0 chance that it could – if that is unacceptable then:

 

1 Stop ingest – wait for all to finish or mark which ones will need to be rescheduled

2 Stop Accumulo

3 Remove the tservers from the servers list

4 Start Accumulo without starting the decommissioned tserver nodes.

 

Do what you want with the data node decommissioning.

 

The later approach removes possible transient issues.  It is up to you to determine your tolerance for possible transient issues for the duration that tservers are being stopped vs a complete outage for the duration that Accumulo is down.  If it is a large cluster and just a few tservers, the odds of a specific tablet being off line for a short duration may be very low.  If it is a small cluster or the percentage of tservers that you are stopping is large then the odds increase, but the issues will still be transient.  You need to decide which is acceptable to you and your circumstances.  

 

From: Shailesh Ligade <SLIGADE@FBI.GOV <ma...@FBI.GOV> > 
Sent: Tuesday, August 17, 2021 11:26 AM
To: user@accumulo.apache.org <ma...@accumulo.apache.org> 
Subject: RE: how to decommission tablet server

 

It will be helpful to know that when you are decommissioning tablets (one at a time for underlying hdfs to replicate), do we need accumulo downtime? Can accumulo be ingesting while we are decommissioning tablets?

 

Thanks

 

-S

 

From: Shailesh Ligade <SLIGADE@FBI.GOV <ma...@FBI.GOV> > 
Sent: Tuesday, August 17, 2021 8:52 AM
To: user@accumulo.apache.org <ma...@accumulo.apache.org> 
Subject: [EXTERNAL EMAIL] - how to decommission tablet server

 

Hello,

 

I am using accumulo 1.10 and want to remove few tablet server

 

I saw in the documentation that I need to run

 

accumulo admin stop <tserver>:9997

 

That command comes back quickly, not sure how long, if any I have to wait for before I stop tserver service? When is the time to stop datanode service (running on the same tablet server)? And when to update slaves files (for accumulo and hdfs)?

 

Any guidelines on this?

 

Thanks

 

-S