You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2021/05/14 14:26:29 UTC

[GitHub] [accumulo-website] milleruntime commented on a change in pull request #282: Documentation for external compactions

milleruntime commented on a change in pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282#discussion_r632565769



##########
File path: _docs-2/administration/compaction.md
##########
@@ -62,6 +68,151 @@ For more information see the javadoc for {% jlink org.apache.accumulo.core.spi.c
 
 The names of the compaction services and executors are used for logging and metrics.
 
+## External Compactions
+
+In Accumulo 2.1 we introduced a new optional feature that allows compactions to run
+outside of the Tablet Server.  External compactions introduces two new server processes
+in an Accumulo deployment:
+
+  * *Compactor*: Accumulo process that runs external compactions and is started with the name of a queue for which it will perform compactions.  In a typical deployment there will be many of these processes running, some for queue A, queue B, etc.  This process will only run a single compaction at a time and will communicate with the Compaction Coordinator to get a compaction job and report its status.
+
+  * *Compaction Coordinator*: a process that manages the compaction queues for all external compactions in the system and assigns compaction tasks to Compactors. In a typical deployment there will be one instance of this process in use at a time with a backup process waiting to become primary (much like the primary and secondary manager processes). This process communicates with the TabletServers to get external compaction job information and report back their status. 
+
+### Configuration
+
+Configuration for external compactions is very similar to the internal compaction example above.
+In the example below we create a Compaction Service `cs1` and configure it with an externalQueue
+named `DCQ1`. We then define the Compaction Dispatcher on table `testTable` and configure the
+table to use the `cs1` Compaction Service for planning and executing compactions.
+
+```
+config -s tserver.compaction.major.service.cs1.planner=org.apache.accumulo.core.spi.compaction.DefaultCompactionPlanner
+config -s 'tserver.compaction.major.service.cs1.planner.opts.executors=[{"name":"all","externalQueue":"DCQ1"}]'
+config -t testTable -s table.compaction.dispatcher=org.apache.accumulo.core.spi.compaction.SimpleCompactionDispatcher
+config -t testTable -s table.compaction.dispatcher.opts.service=cs1
+```
+
+### Overview
+
+The CompactionCoordinator is responsible for managing the global external compaction work queue. For each external compaction queue, the tablet server will maintain an in memory priority queue of the tablets loaded on it that require external compactions. The coordinator polls all tservers to get summary information about their external compaction queues to combine the summary information to determine which tablet server to contact next to get work.  The coordinator does not maintain per tablet information, it only maintains enough information to allow it know which tablet server to contact next for a given queue.  The tablet server will then know what specific tablet in that queue needs to compact.

Review comment:
       The part about tablet information `The coordinator does not maintain per tablet information` is this still valid? If so how does the coordinator know which tablet in the metadata to write the external compaction entry?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org