You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2022/09/09 14:32:44 UTC

[GitHub] [bookkeeper] lordcheng10 commented on issue #3456: BP-56: Support data migration

lordcheng10 commented on issue #3456:
URL: https://github.com/apache/bookkeeper/issues/3456#issuecomment-1242050994

   ### Motivation
   **Currently bookkeeper does not support data migration, only data recovery.**
   
   We have a scenario where a large number of bookie nodes are offline, and we find that the bookie's offline solution is very time-consuming.
   
   Bookie offline steps are as follows:
   1. Set the bookie node to be offline to readOnly;
   2. Wait for the Pulsar data on the Bookie node to be offline to expire and delete;
   3. When most of the data on these offline nodes is expired and cleaned up, there will still be some data that cannot be expired and deleted;
   4. Stop a bookie, and then use the decommission command to migrate the data that has not been expired and cleaned up to the new node:
   `bin/bookkeeper shell decommissionbookie -bookieid xx`
   5. When the data on one bookie node is migrated, continue to the next bookie node;
   
   Step 4 is very time-consuming. We found that waiting for a bookie data migration to complete, it takes about 1 hour, and we have 125 bookie nodes to be offline.
   
   In addition, step 2 is also very time-consuming, depending on the pulsar retain time, usually more than ten hours.
   
   ### Proposal
   To solve this problem, we developed a data migration tool.
   After having this tool, our offline steps are as follows:
   1. Execute the data migration command:
   `bin/bookkeeper shell replicasMigration --bookieIds bookie1,bookie2 --ledgerIds ALL --readOnly true`
   2. When the data migration is completed, stop all bookie nodes to be offline;
   
   In addition, this command can also migrate the replica data on some bookie nodes to other nodes, for example:
   
   `bin/bookkeeper shell replicasMigration --bookieIds bookie1,bookie2 --ledgerIds ledger1,ledger2,ledger3 --readOnly false`
   
   
   ### For example
   1. Migrate all ledger data on bookie1 and bookie2 to other bookie nodes:
   `sh bin/bookkeeper shell replicasMigration -bookieIds bookie1,bookie2 -ledgerIds ALL -readOnly true
   `
   2. Migrate ledger1 and ledger3 on bookie1 and bookie2 to other bookie nodes:
   `sh bin/bookkeeper shell replicasMigration -bookieIds bookie1,bookie2 -ledgerIds ledger1,ledger3 -readOnly false`
   
   
   ### Application scenarios:
   **1. The bookie node goes offline:**
   As mentioned above, after bookkeeper has this data migration tool, the offline steps of bookie are only two steps, and the time-consuming is greatly reduced:
   a.Execute the data migration command:
   bin/bookkeeper shell replicasMigration --bookieIds bookie1,bookie2 --ledgerIds ALL --readOnly true
   b. When the data migration is completed, stop all bookie nodes to be offline;
   
   **2. Expand the bookie node to improve the reading speed of historical data:**
   a. When the client consumes historical data a few days ago, it hopes to increase the reading speed of historical data by expanding the bookie node.
   b. When we expand the new bookie node to the cluster, the new node can only receive the read and write of new data, and cannot improve the reading speed of historical data.
   c. After the data migration tool of bookkeeper, we can migrate some historical data to the new node, let the new node provide some historical data reading, and improve the reading speed of historical data.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@bookkeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org