You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2021/05/11 13:42:17 UTC

[GitHub] [accumulo-website] dlmarion opened a new pull request #282: Documentation for external compactions

dlmarion opened a new pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-website] jmark99 commented on a change in pull request #282: Documentation for external compactions

Posted by GitBox <gi...@apache.org>.
jmark99 commented on a change in pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282#discussion_r630386201



##########
File path: _docs-2/administration/compaction.md
##########
@@ -62,6 +68,151 @@ For more information see the javadoc for {% jlink org.apache.accumulo.core.spi.c
 
 The names of the compaction services and executors are used for logging and metrics.
 
+## External Compactions
+
+In Accumulo 2.1 we introduced a new optional feature that allows compactions to run
+outside of the Tablet Server.  External compactions introduces two new server processes
+in an Accumulo deployment:
+
+  * *Compactor*: Accumulo process that runs external compactions and is started with the name of a queue for which it will perform compactions.  In a typical deployment there will be many of these processes running, some for queue A, queue B, etc.  This process will only run a single compaction at a time and will communicate with the Compaction Coordinator to get a compaction job and report its status.
+
+  * *Compaction Coordinator*: a process that manages the compaction queues for all external compactions in the system and assigns compaction jasks to Compactors. In a typical deployment there will be one instance of this process in use at a time with a backup process waiting to become primary (much like the primary and secondary manager processes). This process communicates with the TabletServers to get external compaction job information and report back their status. 

Review comment:
       There is a typo: jasks -> tasks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-website] milleruntime commented on pull request #282: Documentation for external compactions

Posted by GitBox <gi...@apache.org>.
milleruntime commented on pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282#issuecomment-842422029


   This could definitely be follow on work but I was thinking it would be nice to update the design diagrams to include the external stuff: https://github.com/apache/accumulo/blob/main/core/src/main/java/org/apache/accumulo/core/spi/compaction/doc-files/compaction-spi-design.png


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-website] dlmarion commented on a change in pull request #282: Documentation for external compactions

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282#discussion_r634665156



##########
File path: _docs-2/administration/compaction.md
##########
@@ -62,6 +68,151 @@ For more information see the javadoc for {% jlink org.apache.accumulo.core.spi.c
 
 The names of the compaction services and executors are used for logging and metrics.
 
+## External Compactions
+
+In Accumulo 2.1 we introduced a new optional feature that allows compactions to run
+outside of the Tablet Server.  External compactions introduces two new server processes
+in an Accumulo deployment:
+
+  * *Compactor*: Accumulo process that runs external compactions and is started with the name of a queue for which it will perform compactions.  In a typical deployment there will be many of these processes running, some for queue A, queue B, etc.  This process will only run a single compaction at a time and will communicate with the Compaction Coordinator to get a compaction job and report its status.
+
+  * *Compaction Coordinator*: a process that manages the compaction queues for all external compactions in the system and assigns compaction tasks to Compactors. In a typical deployment there will be one instance of this process in use at a time with a backup process waiting to become primary (much like the primary and secondary manager processes). This process communicates with the TabletServers to get external compaction job information and report back their status. 
+
+### Configuration
+
+Configuration for external compactions is very similar to the internal compaction example above.
+In the example below we create a Compaction Service `cs1` and configure it with an externalQueue
+named `DCQ1`. We then define the Compaction Dispatcher on table `testTable` and configure the
+table to use the `cs1` Compaction Service for planning and executing compactions.
+
+```
+config -s tserver.compaction.major.service.cs1.planner=org.apache.accumulo.core.spi.compaction.DefaultCompactionPlanner
+config -s 'tserver.compaction.major.service.cs1.planner.opts.executors=[{"name":"all","externalQueue":"DCQ1"}]'
+config -t testTable -s table.compaction.dispatcher=org.apache.accumulo.core.spi.compaction.SimpleCompactionDispatcher
+config -t testTable -s table.compaction.dispatcher.opts.service=cs1
+```
+
+### Overview
+
+The CompactionCoordinator is responsible for managing the global external compaction work queue. For each external compaction queue, the tablet server will maintain an in memory priority queue of the tablets loaded on it that require external compactions. The coordinator polls all tservers to get summary information about their external compaction queues to combine the summary information to determine which tablet server to contact next to get work.  The coordinator does not maintain per tablet information, it only maintains enough information to allow it to know which tablet server to contact next for a given queue.  The tablet server will then know what specific tablet in that queue needs to compact.
+
+When a Compactor is free to perform work, it asks the CompactionCoordinator for the next compaction job. The CompactionCoordinator contacts the next TabletServer that has the highest priority for the Compactor's queue. The TabletServer returns the information necessary for the compaction to occur to the CompactionCoordinator, which is passed on to the Compactor. The Compaction Coordinator maintains an in-memory list of running compactions and also inserts an entry into the metadata table for the tablet to denote that an external compaction is running. When the Compactor has finished the compaction, it notifies the CompactionCoordinator which inserts an entry into the metadata table to denote that the external compaction completed and it attempts to notify the TabletServer. If successful, the TabletServer commits the major compaction. If the TabletServer is down, or the Tablet has become hosted on a different TabletServer, then the CompactionCoordinator will fail to notify the Tablet
 Server, but the metadata table entries will remain. The major compaction will be committed in the future by the TabletServer hosting the Tablet.
+
+### External Compaction in Action
+
+Below are some examples of log entries and metadata table entries for external compactions. First, here are some metadata entries for table `2` . You can see that there are three files of different sizes (file size and number of entries are stored in the value portion of the metadata table rows with the "file" column qualifier).
+
+```
+2< file:hdfs://localhost:8020/accumulo/tables/2/default_tablet/A0000047.rf []	12330,99000
+2< file:hdfs://localhost:8020/accumulo/tables/2/default_tablet/F0000048.rf []	1196,1000
+2< file:hdfs://localhost:8020/accumulo/tables/2/default_tablet/F000004j.rf []	1302,1000
+2< last:10000bf4e0a0004 []	localhost:9997
+2< loc:10000bf4e0a0004 []	localhost:9997
+2< srv:compact []	111
+2< srv:dir []	default_tablet
+2< srv:flush []	113
+2< srv:lock []	tservers/localhost:9997/zlock#1950397a-b2ca-4685-b70b-67ae3cd578b9#0000000000$10000bf4e0a0004
+2< srv:time []	M1618325648093
+2< ~tab:~pr []	\x00
+```
+
+Below are excerpts from the TabletServer, CompactionCoordinator, Compactor logs and metadata table. I have merged the logs in time order to make it easier to see what is happening.
+
+In the logs below the Compactor requested a compaction job from the Coordinator with an ExternalCompactionId of `de6afc1d-64ae-4abf-8bce-02ec0a79aa6c`. The Coordinator knew that TabletServer `localhost:9997` had a Tablet that needed compacting and contacted it to get the details. The CompactionManager, a component
+running in the TabletServer, returned the information to the Coordinator. The Coordinator then updates the metadata table (below the logs) for the external compaction and returns the information to the Compactor:
+

Review comment:
       Resolved in 5314ec2bc0215fe75f7f7b866fe3dd26e7cacb87




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-website] Manno15 commented on pull request #282: Documentation for external compactions

Posted by GitBox <gi...@apache.org>.
Manno15 commented on pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282#issuecomment-843180033


   @dlmarion That makes sense. Thanks for the info!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-website] dlmarion commented on a change in pull request #282: Documentation for external compactions

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282#discussion_r630387173



##########
File path: _docs-2/administration/compaction.md
##########
@@ -62,6 +68,151 @@ For more information see the javadoc for {% jlink org.apache.accumulo.core.spi.c
 
 The names of the compaction services and executors are used for logging and metrics.
 
+## External Compactions
+
+In Accumulo 2.1 we introduced a new optional feature that allows compactions to run
+outside of the Tablet Server.  External compactions introduces two new server processes
+in an Accumulo deployment:
+
+  * *Compactor*: Accumulo process that runs external compactions and is started with the name of a queue for which it will perform compactions.  In a typical deployment there will be many of these processes running, some for queue A, queue B, etc.  This process will only run a single compaction at a time and will communicate with the Compaction Coordinator to get a compaction job and report its status.
+
+  * *Compaction Coordinator*: a process that manages the compaction queues for all external compactions in the system and assigns compaction jasks to Compactors. In a typical deployment there will be one instance of this process in use at a time with a backup process waiting to become primary (much like the primary and secondary manager processes). This process communicates with the TabletServers to get external compaction job information and report back their status. 

Review comment:
       good catch. I will fix. Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-website] dlmarion commented on pull request #282: Documentation for external compactions

Posted by GitBox <gi...@apache.org>.
dlmarion commented on pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282#issuecomment-843179335


   Right, so what's missing here is that you have to start the compactor and the coordinator. Here is an example that I gave to @milleruntime using uno:
   
   ```
   ./bin/uno fetch accumulo
   ./bin/uno setup accumulo
   ./install/accumulo-2.1.0-SNAPSHOT/bin/accumulo compaction-coordinator >coordinator.out 2> coordinator.err &
   ./install/accumulo-2.1.0-SNAPSHOT/bin/accumulo compactor -q DCQ1 >compactor.out 2>compactor.err &
   
   ./install/accumulo-2.1.0-SNAPSHOT/bin/accumulo shell -u root -p secret <<EOF
   
   config -s tserver.compaction.major.service.cs1.planner=org.apache.accumulo.core.spi.compaction.DefaultCompactionPlanner
   config -s 'tserver.compaction.major.service.cs1.planner.opts.executors=[{"name":"all","externalQueue":"DCQ1"}]'
   createtable testTable
   config -t testTable -s table.compaction.dispatcher=org.apache.accumulo.core.spi.compaction.SimpleCompactionDispatcher
   config -t testTable -s table.compaction.dispatcher.opts.service=cs1
   EOF
   ```
   
   Then, you need to insert some data into the table. You could do it in the Accumulo shell using commands like:
   ```
   insert r1 q1 f1 v1
   insert r1 q1 f2 v2
   insert r1 q1 f3 v3
   insert r1 q1 f4 v4
   insert r1 q1 f5 v5
   insert r1 q1 f6 v6
   flush -t testTable -w
   insert r2 q1 f1 v7
   insert r2 q1 f2 v8
   insert r2 q1 f3 v9
   flush -t testTable -w
   compact -t testTable
   ```
   
   Or, you could do it with a script. For example, drop the following content into /tmp/insert.js:
   ```
   function insertData(tableName, numRows) {
     var bwConfig = new org.apache.accumulo.core.client.BatchWriterConfig();
     bwConfig.setMaxMemory(1024);
     var bw = connection.createBatchWriter(tableName, bwConfig);
     for (var x = 0; x < numRows; x++) {
       var mut = new org.apache.accumulo.core.data.Mutation(new org.apache.hadoop.io.Text(x));
       mut.put("cf", "cq", new org.apache.accumulo.core.data.Value(new java.lang.String("value").getBytes()));
       bw.addMutation(mut);
       // println("Adding " + x);
       if ((x % 1000) == 0) {
         connection.tableOperations().compact(tableName, null, null, java.util.Collections.emptyList(), true, false);
       }
     }
     bw.flush();
     bw.close();
   }
   ```
   
   Then, run the following in the Accumulo shell:
   ```
   script -e nashorn -f /tmp/insert.js -fx insertData -a tableName=testTable,numRows=100000
   ```
   
   The output will be in the coordinator and compactor logs. You can also start multiple compactors.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-website] Manno15 commented on a change in pull request #282: Documentation for external compactions

Posted by GitBox <gi...@apache.org>.
Manno15 commented on a change in pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282#discussion_r634605263



##########
File path: _docs-2/administration/compaction.md
##########
@@ -62,6 +68,151 @@ For more information see the javadoc for {% jlink org.apache.accumulo.core.spi.c
 
 The names of the compaction services and executors are used for logging and metrics.
 
+## External Compactions
+
+In Accumulo 2.1 we introduced a new optional feature that allows compactions to run
+outside of the Tablet Server.  External compactions introduces two new server processes
+in an Accumulo deployment:
+
+  * *Compactor*: Accumulo process that runs external compactions and is started with the name of a queue for which it will perform compactions.  In a typical deployment there will be many of these processes running, some for queue A, queue B, etc.  This process will only run a single compaction at a time and will communicate with the Compaction Coordinator to get a compaction job and report its status.
+
+  * *Compaction Coordinator*: a process that manages the compaction queues for all external compactions in the system and assigns compaction tasks to Compactors. In a typical deployment there will be one instance of this process in use at a time with a backup process waiting to become primary (much like the primary and secondary manager processes). This process communicates with the TabletServers to get external compaction job information and report back their status. 
+
+### Configuration
+
+Configuration for external compactions is very similar to the internal compaction example above.
+In the example below we create a Compaction Service `cs1` and configure it with an externalQueue
+named `DCQ1`. We then define the Compaction Dispatcher on table `testTable` and configure the
+table to use the `cs1` Compaction Service for planning and executing compactions.
+
+```
+config -s tserver.compaction.major.service.cs1.planner=org.apache.accumulo.core.spi.compaction.DefaultCompactionPlanner
+config -s 'tserver.compaction.major.service.cs1.planner.opts.executors=[{"name":"all","externalQueue":"DCQ1"}]'
+config -t testTable -s table.compaction.dispatcher=org.apache.accumulo.core.spi.compaction.SimpleCompactionDispatcher
+config -t testTable -s table.compaction.dispatcher.opts.service=cs1
+```
+
+### Overview
+
+The CompactionCoordinator is responsible for managing the global external compaction work queue. For each external compaction queue, the tablet server will maintain an in memory priority queue of the tablets loaded on it that require external compactions. The coordinator polls all tservers to get summary information about their external compaction queues to combine the summary information to determine which tablet server to contact next to get work.  The coordinator does not maintain per tablet information, it only maintains enough information to allow it to know which tablet server to contact next for a given queue.  The tablet server will then know what specific tablet in that queue needs to compact.
+
+When a Compactor is free to perform work, it asks the CompactionCoordinator for the next compaction job. The CompactionCoordinator contacts the next TabletServer that has the highest priority for the Compactor's queue. The TabletServer returns the information necessary for the compaction to occur to the CompactionCoordinator, which is passed on to the Compactor. The Compaction Coordinator maintains an in-memory list of running compactions and also inserts an entry into the metadata table for the tablet to denote that an external compaction is running. When the Compactor has finished the compaction, it notifies the CompactionCoordinator which inserts an entry into the metadata table to denote that the external compaction completed and it attempts to notify the TabletServer. If successful, the TabletServer commits the major compaction. If the TabletServer is down, or the Tablet has become hosted on a different TabletServer, then the CompactionCoordinator will fail to notify the Tablet
 Server, but the metadata table entries will remain. The major compaction will be committed in the future by the TabletServer hosting the Tablet.
+
+### External Compaction in Action
+
+Below are some examples of log entries and metadata table entries for external compactions. First, here are some metadata entries for table `2` . You can see that there are three files of different sizes (file size and number of entries are stored in the value portion of the metadata table rows with the "file" column qualifier).
+
+```
+2< file:hdfs://localhost:8020/accumulo/tables/2/default_tablet/A0000047.rf []	12330,99000
+2< file:hdfs://localhost:8020/accumulo/tables/2/default_tablet/F0000048.rf []	1196,1000
+2< file:hdfs://localhost:8020/accumulo/tables/2/default_tablet/F000004j.rf []	1302,1000
+2< last:10000bf4e0a0004 []	localhost:9997
+2< loc:10000bf4e0a0004 []	localhost:9997
+2< srv:compact []	111
+2< srv:dir []	default_tablet
+2< srv:flush []	113
+2< srv:lock []	tservers/localhost:9997/zlock#1950397a-b2ca-4685-b70b-67ae3cd578b9#0000000000$10000bf4e0a0004
+2< srv:time []	M1618325648093
+2< ~tab:~pr []	\x00
+```
+
+Below are excerpts from the TabletServer, CompactionCoordinator, Compactor logs and metadata table. I have merged the logs in time order to make it easier to see what is happening.
+
+In the logs below the Compactor requested a compaction job from the Coordinator with an ExternalCompactionId of `de6afc1d-64ae-4abf-8bce-02ec0a79aa6c`. The Coordinator knew that TabletServer `localhost:9997` had a Tablet that needed compacting and contacted it to get the details. The CompactionManager, a component
+running in the TabletServer, returned the information to the Coordinator. The Coordinator then updates the metadata table (below the logs) for the external compaction and returns the information to the Compactor:
+

Review comment:
       I think it should. Maybe a full example within `accumulo-examples` could be helpful as well in the future. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-website] dlmarion edited a comment on pull request #282: Documentation for external compactions

Posted by GitBox <gi...@apache.org>.
dlmarion edited a comment on pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282#issuecomment-843179335


   Right, so what's missing here is that you have to start the compactor and the coordinator. Here is an example that I gave to @milleruntime using uno:
   
   ```
   ./bin/uno fetch accumulo
   ./bin/uno setup accumulo
   ./install/accumulo-2.1.0-SNAPSHOT/bin/accumulo compaction-coordinator >coordinator.out 2> coordinator.err &
   ./install/accumulo-2.1.0-SNAPSHOT/bin/accumulo compactor -q DCQ1 >compactor.out 2>compactor.err &
   
   ./install/accumulo-2.1.0-SNAPSHOT/bin/accumulo shell -u root -p secret <<EOF
   
   config -s tserver.compaction.major.service.cs1.planner=org.apache.accumulo.core.spi.compaction.DefaultCompactionPlanner
   config -s 'tserver.compaction.major.service.cs1.planner.opts.executors=[{"name":"all","type":"external","queue":"DCQ1"}]'
   createtable testTable
   config -t testTable -s table.compaction.dispatcher=org.apache.accumulo.core.spi.compaction.SimpleCompactionDispatcher
   config -t testTable -s table.compaction.dispatcher.opts.service=cs1
   EOF
   ```
   
   Then, you need to insert some data into the table. You could do it in the Accumulo shell using commands like:
   ```
   insert r1 q1 f1 v1
   insert r1 q1 f2 v2
   insert r1 q1 f3 v3
   insert r1 q1 f4 v4
   insert r1 q1 f5 v5
   insert r1 q1 f6 v6
   flush -t testTable -w
   insert r2 q1 f1 v7
   insert r2 q1 f2 v8
   insert r2 q1 f3 v9
   flush -t testTable -w
   compact -t testTable
   ```
   
   Or, you could do it with a script. For example, drop the following content into /tmp/insert.js:
   ```
   function insertData(tableName, numRows) {
     var bwConfig = new org.apache.accumulo.core.client.BatchWriterConfig();
     bwConfig.setMaxMemory(1024);
     var bw = connection.createBatchWriter(tableName, bwConfig);
     for (var x = 0; x < numRows; x++) {
       var mut = new org.apache.accumulo.core.data.Mutation(new org.apache.hadoop.io.Text(x));
       mut.put("cf", "cq", new org.apache.accumulo.core.data.Value(new java.lang.String("value").getBytes()));
       bw.addMutation(mut);
       // println("Adding " + x);
       if ((x % 1000) == 0) {
         connection.tableOperations().compact(tableName, null, null, java.util.Collections.emptyList(), true, false);
       }
     }
     bw.flush();
     bw.close();
   }
   ```
   
   Then, run the following in the Accumulo shell:
   ```
   script -e nashorn -f /tmp/insert.js -fx insertData -a tableName=testTable,numRows=100000
   ```
   
   The output will be in the coordinator and compactor logs. You can also start multiple compactors.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-website] Manno15 commented on a change in pull request #282: Documentation for external compactions

Posted by GitBox <gi...@apache.org>.
Manno15 commented on a change in pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282#discussion_r633618122



##########
File path: _docs-2/administration/compaction.md
##########
@@ -62,6 +68,151 @@ For more information see the javadoc for {% jlink org.apache.accumulo.core.spi.c
 
 The names of the compaction services and executors are used for logging and metrics.
 
+## External Compactions
+
+In Accumulo 2.1 we introduced a new optional feature that allows compactions to run
+outside of the Tablet Server.  External compactions introduces two new server processes
+in an Accumulo deployment:
+
+  * *Compactor*: Accumulo process that runs external compactions and is started with the name of a queue for which it will perform compactions.  In a typical deployment there will be many of these processes running, some for queue A, queue B, etc.  This process will only run a single compaction at a time and will communicate with the Compaction Coordinator to get a compaction job and report its status.
+
+  * *Compaction Coordinator*: a process that manages the compaction queues for all external compactions in the system and assigns compaction tasks to Compactors. In a typical deployment there will be one instance of this process in use at a time with a backup process waiting to become primary (much like the primary and secondary manager processes). This process communicates with the TabletServers to get external compaction job information and report back their status. 
+
+### Configuration
+
+Configuration for external compactions is very similar to the internal compaction example above.
+In the example below we create a Compaction Service `cs1` and configure it with an externalQueue
+named `DCQ1`. We then define the Compaction Dispatcher on table `testTable` and configure the
+table to use the `cs1` Compaction Service for planning and executing compactions.
+
+```
+config -s tserver.compaction.major.service.cs1.planner=org.apache.accumulo.core.spi.compaction.DefaultCompactionPlanner
+config -s 'tserver.compaction.major.service.cs1.planner.opts.executors=[{"name":"all","externalQueue":"DCQ1"}]'
+config -t testTable -s table.compaction.dispatcher=org.apache.accumulo.core.spi.compaction.SimpleCompactionDispatcher
+config -t testTable -s table.compaction.dispatcher.opts.service=cs1
+```
+
+### Overview
+
+The CompactionCoordinator is responsible for managing the global external compaction work queue. For each external compaction queue, the tablet server will maintain an in memory priority queue of the tablets loaded on it that require external compactions. The coordinator polls all tservers to get summary information about their external compaction queues to combine the summary information to determine which tablet server to contact next to get work.  The coordinator does not maintain per tablet information, it only maintains enough information to allow it know which tablet server to contact next for a given queue.  The tablet server will then know what specific tablet in that queue needs to compact.

Review comment:
       >  The coordinator does not maintain per tablet information, it only maintains enough information to allow it know which tablet server
   
   Very minor revision. I think this should say allow it _to_ know. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-website] keith-turner commented on a change in pull request #282: Documentation for external compactions

Posted by GitBox <gi...@apache.org>.
keith-turner commented on a change in pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282#discussion_r632613635



##########
File path: _docs-2/administration/compaction.md
##########
@@ -62,6 +68,151 @@ For more information see the javadoc for {% jlink org.apache.accumulo.core.spi.c
 
 The names of the compaction services and executors are used for logging and metrics.
 
+## External Compactions
+
+In Accumulo 2.1 we introduced a new optional feature that allows compactions to run
+outside of the Tablet Server.  External compactions introduces two new server processes
+in an Accumulo deployment:
+
+  * *Compactor*: Accumulo process that runs external compactions and is started with the name of a queue for which it will perform compactions.  In a typical deployment there will be many of these processes running, some for queue A, queue B, etc.  This process will only run a single compaction at a time and will communicate with the Compaction Coordinator to get a compaction job and report its status.
+
+  * *Compaction Coordinator*: a process that manages the compaction queues for all external compactions in the system and assigns compaction tasks to Compactors. In a typical deployment there will be one instance of this process in use at a time with a backup process waiting to become primary (much like the primary and secondary manager processes). This process communicates with the TabletServers to get external compaction job information and report back their status. 
+
+### Configuration
+
+Configuration for external compactions is very similar to the internal compaction example above.
+In the example below we create a Compaction Service `cs1` and configure it with an externalQueue
+named `DCQ1`. We then define the Compaction Dispatcher on table `testTable` and configure the
+table to use the `cs1` Compaction Service for planning and executing compactions.
+
+```
+config -s tserver.compaction.major.service.cs1.planner=org.apache.accumulo.core.spi.compaction.DefaultCompactionPlanner
+config -s 'tserver.compaction.major.service.cs1.planner.opts.executors=[{"name":"all","externalQueue":"DCQ1"}]'
+config -t testTable -s table.compaction.dispatcher=org.apache.accumulo.core.spi.compaction.SimpleCompactionDispatcher
+config -t testTable -s table.compaction.dispatcher.opts.service=cs1
+```
+
+### Overview
+
+The CompactionCoordinator is responsible for managing the global external compaction work queue. For each external compaction queue, the tablet server will maintain an in memory priority queue of the tablets loaded on it that require external compactions. The coordinator polls all tservers to get summary information about their external compaction queues to combine the summary information to determine which tablet server to contact next to get work.  The coordinator does not maintain per tablet information, it only maintains enough information to allow it know which tablet server to contact next for a given queue.  The tablet server will then know what specific tablet in that queue needs to compact.

Review comment:
       There are two types of information in the metadata table related to external compactions.  
   
   The first type of metadata is stored under a tablets row and contains information about running external compactions.  Tablets are authorities for this information and are the only ones to read/write it, the coordinator does not.
   
   The second type of information in the metadata table is stored under the ~ecomp prefix and contains information about completed or failed compactions.  This information is written by the coordinator and deleted by tablets upon successful commit of an external compaction or deleted by the coordinator when it detects a completed compaction for a tablet that no longer exists.  The primary purpose of this information is to allow the coordinator to persist information about completed compactions for tablets that are temporarily offline so that it can notify them later.
   
   Compactors reserver and commit external compactions via the coordinator (which in turn talks to tservers).  During this process they pass back and forth information about specific extents.  However for the purpose of finding the next tserver to reserve an external compaction form, the coordinator does not maintain per tablet information.  Rather it maintains per tserver summary information that helps it find the next tserver to contact.  The summary information is managed by [QueueSummaries.java](https://github.com/apache/accumulo/blob/327a48f4cf8d09a006cc137d3505bfc644e93994/server/compaction-coordinator/src/main/java/org/apache/accumulo/coordinator/QueueSummaries.java#L38) within the coordinator.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-website] milleruntime commented on a change in pull request #282: Documentation for external compactions

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282#discussion_r632565769



##########
File path: _docs-2/administration/compaction.md
##########
@@ -62,6 +68,151 @@ For more information see the javadoc for {% jlink org.apache.accumulo.core.spi.c
 
 The names of the compaction services and executors are used for logging and metrics.
 
+## External Compactions
+
+In Accumulo 2.1 we introduced a new optional feature that allows compactions to run
+outside of the Tablet Server.  External compactions introduces two new server processes
+in an Accumulo deployment:
+
+  * *Compactor*: Accumulo process that runs external compactions and is started with the name of a queue for which it will perform compactions.  In a typical deployment there will be many of these processes running, some for queue A, queue B, etc.  This process will only run a single compaction at a time and will communicate with the Compaction Coordinator to get a compaction job and report its status.
+
+  * *Compaction Coordinator*: a process that manages the compaction queues for all external compactions in the system and assigns compaction tasks to Compactors. In a typical deployment there will be one instance of this process in use at a time with a backup process waiting to become primary (much like the primary and secondary manager processes). This process communicates with the TabletServers to get external compaction job information and report back their status. 
+
+### Configuration
+
+Configuration for external compactions is very similar to the internal compaction example above.
+In the example below we create a Compaction Service `cs1` and configure it with an externalQueue
+named `DCQ1`. We then define the Compaction Dispatcher on table `testTable` and configure the
+table to use the `cs1` Compaction Service for planning and executing compactions.
+
+```
+config -s tserver.compaction.major.service.cs1.planner=org.apache.accumulo.core.spi.compaction.DefaultCompactionPlanner
+config -s 'tserver.compaction.major.service.cs1.planner.opts.executors=[{"name":"all","externalQueue":"DCQ1"}]'
+config -t testTable -s table.compaction.dispatcher=org.apache.accumulo.core.spi.compaction.SimpleCompactionDispatcher
+config -t testTable -s table.compaction.dispatcher.opts.service=cs1
+```
+
+### Overview
+
+The CompactionCoordinator is responsible for managing the global external compaction work queue. For each external compaction queue, the tablet server will maintain an in memory priority queue of the tablets loaded on it that require external compactions. The coordinator polls all tservers to get summary information about their external compaction queues to combine the summary information to determine which tablet server to contact next to get work.  The coordinator does not maintain per tablet information, it only maintains enough information to allow it know which tablet server to contact next for a given queue.  The tablet server will then know what specific tablet in that queue needs to compact.

Review comment:
       The part about tablet information `The coordinator does not maintain per tablet information` is this still valid? If so how does the coordinator know which tablet in the metadata to write the external compaction entry?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-website] dlmarion commented on a change in pull request #282: Documentation for external compactions

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282#discussion_r643027740



##########
File path: _docs-2/administration/compaction.md
##########
@@ -62,6 +68,151 @@ For more information see the javadoc for {% jlink org.apache.accumulo.core.spi.c
 
 The names of the compaction services and executors are used for logging and metrics.
 
+## External Compactions
+
+In Accumulo 2.1 we introduced a new optional feature that allows compactions to run
+outside of the Tablet Server.  External compactions introduces two new server processes
+in an Accumulo deployment:
+
+  * *Compactor*: Accumulo process that runs external compactions and is started with the name of a queue for which it will perform compactions.  In a typical deployment there will be many of these processes running, some for queue A, queue B, etc.  This process will only run a single compaction at a time and will communicate with the Compaction Coordinator to get a compaction job and report its status.
+
+  * *Compaction Coordinator*: a process that manages the compaction queues for all external compactions in the system and assigns compaction tasks to Compactors. In a typical deployment there will be one instance of this process in use at a time with a backup process waiting to become primary (much like the primary and secondary manager processes). This process communicates with the TabletServers to get external compaction job information and report back their status. 
+

Review comment:
       I added this in 1035390.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-website] dlmarion edited a comment on pull request #282: Documentation for external compactions

Posted by GitBox <gi...@apache.org>.
dlmarion edited a comment on pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282#issuecomment-843206626


   FWIW you can mix internal and external compactions, using something like:
   ```
   config -s 'tserver.compaction.major.service.cs1.planner.opts.executors=[{"name":"small","type":"internal","maxSize":"16M","numThreads":8},{"name":"medium","type":"internal","maxSize":"128M","numThreads":4},{"name":"large","type":"external","queue":"LargeQ"}]'
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-website] milleruntime commented on a change in pull request #282: Documentation for external compactions

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282#discussion_r632572685



##########
File path: _docs-2/administration/compaction.md
##########
@@ -62,6 +68,151 @@ For more information see the javadoc for {% jlink org.apache.accumulo.core.spi.c
 
 The names of the compaction services and executors are used for logging and metrics.
 
+## External Compactions
+
+In Accumulo 2.1 we introduced a new optional feature that allows compactions to run
+outside of the Tablet Server.  External compactions introduces two new server processes
+in an Accumulo deployment:
+
+  * *Compactor*: Accumulo process that runs external compactions and is started with the name of a queue for which it will perform compactions.  In a typical deployment there will be many of these processes running, some for queue A, queue B, etc.  This process will only run a single compaction at a time and will communicate with the Compaction Coordinator to get a compaction job and report its status.
+
+  * *Compaction Coordinator*: a process that manages the compaction queues for all external compactions in the system and assigns compaction tasks to Compactors. In a typical deployment there will be one instance of this process in use at a time with a backup process waiting to become primary (much like the primary and secondary manager processes). This process communicates with the TabletServers to get external compaction job information and report back their status. 
+
+### Configuration
+
+Configuration for external compactions is very similar to the internal compaction example above.
+In the example below we create a Compaction Service `cs1` and configure it with an externalQueue
+named `DCQ1`. We then define the Compaction Dispatcher on table `testTable` and configure the
+table to use the `cs1` Compaction Service for planning and executing compactions.
+
+```
+config -s tserver.compaction.major.service.cs1.planner=org.apache.accumulo.core.spi.compaction.DefaultCompactionPlanner
+config -s 'tserver.compaction.major.service.cs1.planner.opts.executors=[{"name":"all","externalQueue":"DCQ1"}]'
+config -t testTable -s table.compaction.dispatcher=org.apache.accumulo.core.spi.compaction.SimpleCompactionDispatcher
+config -t testTable -s table.compaction.dispatcher.opts.service=cs1
+```
+
+### Overview
+
+The CompactionCoordinator is responsible for managing the global external compaction work queue. For each external compaction queue, the tablet server will maintain an in memory priority queue of the tablets loaded on it that require external compactions. The coordinator polls all tservers to get summary information about their external compaction queues to combine the summary information to determine which tablet server to contact next to get work.  The coordinator does not maintain per tablet information, it only maintains enough information to allow it know which tablet server to contact next for a given queue.  The tablet server will then know what specific tablet in that queue needs to compact.
+
+When a Compactor is free to perform work, it asks the CompactionCoordinator for the next compaction job. The CompactionCoordinator contacts the next TabletServer that has the highest priority for the Compactor's queue. The TabletServer returns the information necessary for the compaction to occur to the CompactionCoordinator, which is passed on to the Compactor. The Compaction Coordinator maintains an in-memory list of running compactions and also inserts an entry into the metadata table for the tablet to denote that an external compaction is running. When the Compactor has finished the compaction, it notifies the CompactionCoordinator which inserts an entry into the metadata table to denote that the external compaction completed and it attempts to notify the TabletServer. If successful, the TabletServer commits the major compaction. If the TabletServer is down, or the Tablet has become hosted on a different TabletServer, then the CompactionCoordinator will fail to notify the Tablet
 Server, but the metadata table entries will remain. The major compaction will be committed in the future by the TabletServer hosting the Tablet.
+
+### External Compaction in Action
+
+Below are some examples of log entries and metadata table entries for external compactions. First, here are some metadata entries for table `2` . You can see that there are three files of different sizes
+

Review comment:
       It would be helpful to highlight which values are the sizes.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-website] dlmarion commented on a change in pull request #282: Documentation for external compactions

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282#discussion_r633712759



##########
File path: _docs-2/administration/compaction.md
##########
@@ -62,6 +68,151 @@ For more information see the javadoc for {% jlink org.apache.accumulo.core.spi.c
 
 The names of the compaction services and executors are used for logging and metrics.
 
+## External Compactions
+
+In Accumulo 2.1 we introduced a new optional feature that allows compactions to run
+outside of the Tablet Server.  External compactions introduces two new server processes
+in an Accumulo deployment:
+
+  * *Compactor*: Accumulo process that runs external compactions and is started with the name of a queue for which it will perform compactions.  In a typical deployment there will be many of these processes running, some for queue A, queue B, etc.  This process will only run a single compaction at a time and will communicate with the Compaction Coordinator to get a compaction job and report its status.
+
+  * *Compaction Coordinator*: a process that manages the compaction queues for all external compactions in the system and assigns compaction tasks to Compactors. In a typical deployment there will be one instance of this process in use at a time with a backup process waiting to become primary (much like the primary and secondary manager processes). This process communicates with the TabletServers to get external compaction job information and report back their status. 
+
+### Configuration
+
+Configuration for external compactions is very similar to the internal compaction example above.
+In the example below we create a Compaction Service `cs1` and configure it with an externalQueue
+named `DCQ1`. We then define the Compaction Dispatcher on table `testTable` and configure the
+table to use the `cs1` Compaction Service for planning and executing compactions.
+
+```
+config -s tserver.compaction.major.service.cs1.planner=org.apache.accumulo.core.spi.compaction.DefaultCompactionPlanner
+config -s 'tserver.compaction.major.service.cs1.planner.opts.executors=[{"name":"all","externalQueue":"DCQ1"}]'
+config -t testTable -s table.compaction.dispatcher=org.apache.accumulo.core.spi.compaction.SimpleCompactionDispatcher
+config -t testTable -s table.compaction.dispatcher.opts.service=cs1
+```
+
+### Overview
+
+The CompactionCoordinator is responsible for managing the global external compaction work queue. For each external compaction queue, the tablet server will maintain an in memory priority queue of the tablets loaded on it that require external compactions. The coordinator polls all tservers to get summary information about their external compaction queues to combine the summary information to determine which tablet server to contact next to get work.  The coordinator does not maintain per tablet information, it only maintains enough information to allow it know which tablet server to contact next for a given queue.  The tablet server will then know what specific tablet in that queue needs to compact.
+
+When a Compactor is free to perform work, it asks the CompactionCoordinator for the next compaction job. The CompactionCoordinator contacts the next TabletServer that has the highest priority for the Compactor's queue. The TabletServer returns the information necessary for the compaction to occur to the CompactionCoordinator, which is passed on to the Compactor. The Compaction Coordinator maintains an in-memory list of running compactions and also inserts an entry into the metadata table for the tablet to denote that an external compaction is running. When the Compactor has finished the compaction, it notifies the CompactionCoordinator which inserts an entry into the metadata table to denote that the external compaction completed and it attempts to notify the TabletServer. If successful, the TabletServer commits the major compaction. If the TabletServer is down, or the Tablet has become hosted on a different TabletServer, then the CompactionCoordinator will fail to notify the Tablet
 Server, but the metadata table entries will remain. The major compaction will be committed in the future by the TabletServer hosting the Tablet.
+
+### External Compaction in Action
+
+Below are some examples of log entries and metadata table entries for external compactions. First, here are some metadata entries for table `2` . You can see that there are three files of different sizes
+

Review comment:
       Resolved in bf31edfa9452de80f10ea3c9171c70451237f956

##########
File path: _docs-2/administration/compaction.md
##########
@@ -62,6 +68,151 @@ For more information see the javadoc for {% jlink org.apache.accumulo.core.spi.c
 
 The names of the compaction services and executors are used for logging and metrics.
 
+## External Compactions
+
+In Accumulo 2.1 we introduced a new optional feature that allows compactions to run
+outside of the Tablet Server.  External compactions introduces two new server processes
+in an Accumulo deployment:
+
+  * *Compactor*: Accumulo process that runs external compactions and is started with the name of a queue for which it will perform compactions.  In a typical deployment there will be many of these processes running, some for queue A, queue B, etc.  This process will only run a single compaction at a time and will communicate with the Compaction Coordinator to get a compaction job and report its status.
+
+  * *Compaction Coordinator*: a process that manages the compaction queues for all external compactions in the system and assigns compaction tasks to Compactors. In a typical deployment there will be one instance of this process in use at a time with a backup process waiting to become primary (much like the primary and secondary manager processes). This process communicates with the TabletServers to get external compaction job information and report back their status. 
+
+### Configuration
+
+Configuration for external compactions is very similar to the internal compaction example above.
+In the example below we create a Compaction Service `cs1` and configure it with an externalQueue
+named `DCQ1`. We then define the Compaction Dispatcher on table `testTable` and configure the
+table to use the `cs1` Compaction Service for planning and executing compactions.
+
+```
+config -s tserver.compaction.major.service.cs1.planner=org.apache.accumulo.core.spi.compaction.DefaultCompactionPlanner
+config -s 'tserver.compaction.major.service.cs1.planner.opts.executors=[{"name":"all","externalQueue":"DCQ1"}]'
+config -t testTable -s table.compaction.dispatcher=org.apache.accumulo.core.spi.compaction.SimpleCompactionDispatcher
+config -t testTable -s table.compaction.dispatcher.opts.service=cs1
+```
+
+### Overview
+
+The CompactionCoordinator is responsible for managing the global external compaction work queue. For each external compaction queue, the tablet server will maintain an in memory priority queue of the tablets loaded on it that require external compactions. The coordinator polls all tservers to get summary information about their external compaction queues to combine the summary information to determine which tablet server to contact next to get work.  The coordinator does not maintain per tablet information, it only maintains enough information to allow it know which tablet server to contact next for a given queue.  The tablet server will then know what specific tablet in that queue needs to compact.

Review comment:
       Resolved in bf31edfa9452de80f10ea3c9171c70451237f956




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-website] Manno15 commented on a change in pull request #282: Documentation for external compactions

Posted by GitBox <gi...@apache.org>.
Manno15 commented on a change in pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282#discussion_r633618122



##########
File path: _docs-2/administration/compaction.md
##########
@@ -62,6 +68,151 @@ For more information see the javadoc for {% jlink org.apache.accumulo.core.spi.c
 
 The names of the compaction services and executors are used for logging and metrics.
 
+## External Compactions
+
+In Accumulo 2.1 we introduced a new optional feature that allows compactions to run
+outside of the Tablet Server.  External compactions introduces two new server processes
+in an Accumulo deployment:
+
+  * *Compactor*: Accumulo process that runs external compactions and is started with the name of a queue for which it will perform compactions.  In a typical deployment there will be many of these processes running, some for queue A, queue B, etc.  This process will only run a single compaction at a time and will communicate with the Compaction Coordinator to get a compaction job and report its status.
+
+  * *Compaction Coordinator*: a process that manages the compaction queues for all external compactions in the system and assigns compaction tasks to Compactors. In a typical deployment there will be one instance of this process in use at a time with a backup process waiting to become primary (much like the primary and secondary manager processes). This process communicates with the TabletServers to get external compaction job information and report back their status. 
+
+### Configuration
+
+Configuration for external compactions is very similar to the internal compaction example above.
+In the example below we create a Compaction Service `cs1` and configure it with an externalQueue
+named `DCQ1`. We then define the Compaction Dispatcher on table `testTable` and configure the
+table to use the `cs1` Compaction Service for planning and executing compactions.
+
+```
+config -s tserver.compaction.major.service.cs1.planner=org.apache.accumulo.core.spi.compaction.DefaultCompactionPlanner
+config -s 'tserver.compaction.major.service.cs1.planner.opts.executors=[{"name":"all","externalQueue":"DCQ1"}]'
+config -t testTable -s table.compaction.dispatcher=org.apache.accumulo.core.spi.compaction.SimpleCompactionDispatcher
+config -t testTable -s table.compaction.dispatcher.opts.service=cs1
+```
+
+### Overview
+
+The CompactionCoordinator is responsible for managing the global external compaction work queue. For each external compaction queue, the tablet server will maintain an in memory priority queue of the tablets loaded on it that require external compactions. The coordinator polls all tservers to get summary information about their external compaction queues to combine the summary information to determine which tablet server to contact next to get work.  The coordinator does not maintain per tablet information, it only maintains enough information to allow it know which tablet server to contact next for a given queue.  The tablet server will then know what specific tablet in that queue needs to compact.

Review comment:
       >  The coordinator does not maintain per tablet information, it only maintains enough information to allow it know which tablet server
   
   Very minor revision. I think this should say allow it _to_ know. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-website] dlmarion commented on a change in pull request #282: Documentation for external compactions

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282#discussion_r633712759



##########
File path: _docs-2/administration/compaction.md
##########
@@ -62,6 +68,151 @@ For more information see the javadoc for {% jlink org.apache.accumulo.core.spi.c
 
 The names of the compaction services and executors are used for logging and metrics.
 
+## External Compactions
+
+In Accumulo 2.1 we introduced a new optional feature that allows compactions to run
+outside of the Tablet Server.  External compactions introduces two new server processes
+in an Accumulo deployment:
+
+  * *Compactor*: Accumulo process that runs external compactions and is started with the name of a queue for which it will perform compactions.  In a typical deployment there will be many of these processes running, some for queue A, queue B, etc.  This process will only run a single compaction at a time and will communicate with the Compaction Coordinator to get a compaction job and report its status.
+
+  * *Compaction Coordinator*: a process that manages the compaction queues for all external compactions in the system and assigns compaction tasks to Compactors. In a typical deployment there will be one instance of this process in use at a time with a backup process waiting to become primary (much like the primary and secondary manager processes). This process communicates with the TabletServers to get external compaction job information and report back their status. 
+
+### Configuration
+
+Configuration for external compactions is very similar to the internal compaction example above.
+In the example below we create a Compaction Service `cs1` and configure it with an externalQueue
+named `DCQ1`. We then define the Compaction Dispatcher on table `testTable` and configure the
+table to use the `cs1` Compaction Service for planning and executing compactions.
+
+```
+config -s tserver.compaction.major.service.cs1.planner=org.apache.accumulo.core.spi.compaction.DefaultCompactionPlanner
+config -s 'tserver.compaction.major.service.cs1.planner.opts.executors=[{"name":"all","externalQueue":"DCQ1"}]'
+config -t testTable -s table.compaction.dispatcher=org.apache.accumulo.core.spi.compaction.SimpleCompactionDispatcher
+config -t testTable -s table.compaction.dispatcher.opts.service=cs1
+```
+
+### Overview
+
+The CompactionCoordinator is responsible for managing the global external compaction work queue. For each external compaction queue, the tablet server will maintain an in memory priority queue of the tablets loaded on it that require external compactions. The coordinator polls all tservers to get summary information about their external compaction queues to combine the summary information to determine which tablet server to contact next to get work.  The coordinator does not maintain per tablet information, it only maintains enough information to allow it know which tablet server to contact next for a given queue.  The tablet server will then know what specific tablet in that queue needs to compact.
+
+When a Compactor is free to perform work, it asks the CompactionCoordinator for the next compaction job. The CompactionCoordinator contacts the next TabletServer that has the highest priority for the Compactor's queue. The TabletServer returns the information necessary for the compaction to occur to the CompactionCoordinator, which is passed on to the Compactor. The Compaction Coordinator maintains an in-memory list of running compactions and also inserts an entry into the metadata table for the tablet to denote that an external compaction is running. When the Compactor has finished the compaction, it notifies the CompactionCoordinator which inserts an entry into the metadata table to denote that the external compaction completed and it attempts to notify the TabletServer. If successful, the TabletServer commits the major compaction. If the TabletServer is down, or the Tablet has become hosted on a different TabletServer, then the CompactionCoordinator will fail to notify the Tablet
 Server, but the metadata table entries will remain. The major compaction will be committed in the future by the TabletServer hosting the Tablet.
+
+### External Compaction in Action
+
+Below are some examples of log entries and metadata table entries for external compactions. First, here are some metadata entries for table `2` . You can see that there are three files of different sizes
+

Review comment:
       Resolved in bf31edfa9452de80f10ea3c9171c70451237f956




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-website] jmark99 commented on a change in pull request #282: Documentation for external compactions

Posted by GitBox <gi...@apache.org>.
jmark99 commented on a change in pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282#discussion_r630437870



##########
File path: _docs-2/getting-started/design.md
##########
@@ -88,6 +88,26 @@ Multiple Monitors can be run to provide hot-standby support in the face of failu
 forwarding of logs from remote hosts to the Monitor, only one Monitor process should be active
 at one time. Leader election will be performed internally to choose the active Monitor.
 
+### Compactor
+
+The Accumulo Compactor process is an optional application that can be used to run compactions
+outside of the TabletServer. One to many Compactors can be run on a cluster and each Compactor
+process performs one compaction at a time. The Compactor registers its existence in ZooKeeper
+and communicates with the Compaction Coordinator to retrieve it's work and to register the

Review comment:
       I think <i>it's</i> should be <i>its</i>




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-website] keith-turner commented on a change in pull request #282: Documentation for external compactions

Posted by GitBox <gi...@apache.org>.
keith-turner commented on a change in pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282#discussion_r634595480



##########
File path: _docs-2/administration/compaction.md
##########
@@ -62,6 +68,151 @@ For more information see the javadoc for {% jlink org.apache.accumulo.core.spi.c
 
 The names of the compaction services and executors are used for logging and metrics.
 
+## External Compactions
+
+In Accumulo 2.1 we introduced a new optional feature that allows compactions to run
+outside of the Tablet Server.  External compactions introduces two new server processes
+in an Accumulo deployment:
+
+  * *Compactor*: Accumulo process that runs external compactions and is started with the name of a queue for which it will perform compactions.  In a typical deployment there will be many of these processes running, some for queue A, queue B, etc.  This process will only run a single compaction at a time and will communicate with the Compaction Coordinator to get a compaction job and report its status.
+
+  * *Compaction Coordinator*: a process that manages the compaction queues for all external compactions in the system and assigns compaction tasks to Compactors. In a typical deployment there will be one instance of this process in use at a time with a backup process waiting to become primary (much like the primary and secondary manager processes). This process communicates with the TabletServers to get external compaction job information and report back their status. 
+
+### Configuration
+
+Configuration for external compactions is very similar to the internal compaction example above.
+In the example below we create a Compaction Service `cs1` and configure it with an externalQueue
+named `DCQ1`. We then define the Compaction Dispatcher on table `testTable` and configure the
+table to use the `cs1` Compaction Service for planning and executing compactions.
+
+```
+config -s tserver.compaction.major.service.cs1.planner=org.apache.accumulo.core.spi.compaction.DefaultCompactionPlanner
+config -s 'tserver.compaction.major.service.cs1.planner.opts.executors=[{"name":"all","externalQueue":"DCQ1"}]'
+config -t testTable -s table.compaction.dispatcher=org.apache.accumulo.core.spi.compaction.SimpleCompactionDispatcher
+config -t testTable -s table.compaction.dispatcher.opts.service=cs1
+```
+
+### Overview
+
+The CompactionCoordinator is responsible for managing the global external compaction work queue. For each external compaction queue, the tablet server will maintain an in memory priority queue of the tablets loaded on it that require external compactions. The coordinator polls all tservers to get summary information about their external compaction queues to combine the summary information to determine which tablet server to contact next to get work.  The coordinator does not maintain per tablet information, it only maintains enough information to allow it to know which tablet server to contact next for a given queue.  The tablet server will then know what specific tablet in that queue needs to compact.
+
+When a Compactor is free to perform work, it asks the CompactionCoordinator for the next compaction job. The CompactionCoordinator contacts the next TabletServer that has the highest priority for the Compactor's queue. The TabletServer returns the information necessary for the compaction to occur to the CompactionCoordinator, which is passed on to the Compactor. The Compaction Coordinator maintains an in-memory list of running compactions and also inserts an entry into the metadata table for the tablet to denote that an external compaction is running. When the Compactor has finished the compaction, it notifies the CompactionCoordinator which inserts an entry into the metadata table to denote that the external compaction completed and it attempts to notify the TabletServer. If successful, the TabletServer commits the major compaction. If the TabletServer is down, or the Tablet has become hosted on a different TabletServer, then the CompactionCoordinator will fail to notify the Tablet
 Server, but the metadata table entries will remain. The major compaction will be committed in the future by the TabletServer hosting the Tablet.
+
+### External Compaction in Action
+
+Below are some examples of log entries and metadata table entries for external compactions. First, here are some metadata entries for table `2` . You can see that there are three files of different sizes (file size and number of entries are stored in the value portion of the metadata table rows with the "file" column qualifier).
+
+```
+2< file:hdfs://localhost:8020/accumulo/tables/2/default_tablet/A0000047.rf []	12330,99000
+2< file:hdfs://localhost:8020/accumulo/tables/2/default_tablet/F0000048.rf []	1196,1000
+2< file:hdfs://localhost:8020/accumulo/tables/2/default_tablet/F000004j.rf []	1302,1000
+2< last:10000bf4e0a0004 []	localhost:9997
+2< loc:10000bf4e0a0004 []	localhost:9997
+2< srv:compact []	111
+2< srv:dir []	default_tablet
+2< srv:flush []	113
+2< srv:lock []	tservers/localhost:9997/zlock#1950397a-b2ca-4685-b70b-67ae3cd578b9#0000000000$10000bf4e0a0004
+2< srv:time []	M1618325648093
+2< ~tab:~pr []	\x00
+```
+
+Below are excerpts from the TabletServer, CompactionCoordinator, Compactor logs and metadata table. I have merged the logs in time order to make it easier to see what is happening.
+
+In the logs below the Compactor requested a compaction job from the Coordinator with an ExternalCompactionId of `de6afc1d-64ae-4abf-8bce-02ec0a79aa6c`. The Coordinator knew that TabletServer `localhost:9997` had a Tablet that needed compacting and contacted it to get the details. The CompactionManager, a component
+running in the TabletServer, returned the information to the Coordinator. The Coordinator then updates the metadata table (below the logs) for the external compaction and returns the information to the Compactor:
+

Review comment:
       Should this example show starting the coordinator and compactor?

##########
File path: _docs-2/administration/compaction.md
##########
@@ -62,6 +68,151 @@ For more information see the javadoc for {% jlink org.apache.accumulo.core.spi.c
 
 The names of the compaction services and executors are used for logging and metrics.
 
+## External Compactions
+
+In Accumulo 2.1 we introduced a new optional feature that allows compactions to run
+outside of the Tablet Server.  External compactions introduces two new server processes
+in an Accumulo deployment:
+
+  * *Compactor*: Accumulo process that runs external compactions and is started with the name of a queue for which it will perform compactions.  In a typical deployment there will be many of these processes running, some for queue A, queue B, etc.  This process will only run a single compaction at a time and will communicate with the Compaction Coordinator to get a compaction job and report its status.
+
+  * *Compaction Coordinator*: a process that manages the compaction queues for all external compactions in the system and assigns compaction tasks to Compactors. In a typical deployment there will be one instance of this process in use at a time with a backup process waiting to become primary (much like the primary and secondary manager processes). This process communicates with the TabletServers to get external compaction job information and report back their status. 
+

Review comment:
       Not sure if this should go here.  Was thinking it would be nice to add something like the following.  I attempted to describe what edge cases are handled, w/o describing how they are handled.
   
   ```
   External compactions handle faults and major system events in Accumulo.  When a compactor process dies this will be detected and any files it had reserved in a tablet will be unreserved.  When a tserver dies, this will not impact any external compactions running on behalf of tablets that tserver was hosting.  The case of tablets not being hosted on an tserver when an external compaction tries to commit is also handled.   Tablets being deleted (by split, merge, or table deletion) will cause any associated running external compactions to be canceled.  When a user initiated compaction is canceled, any external compactions running as part of that will be canceled.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-website] Manno15 commented on pull request #282: Documentation for external compactions

Posted by GitBox <gi...@apache.org>.
Manno15 commented on pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282#issuecomment-843174050


   I am running through this configuration example using uno. However, I am not quite seeing all the expected logs (specifically with the coordinator logs). Should they show up in the Tserver logs or is there a different location where the coordinator logs show up? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-website] dlmarion commented on a change in pull request #282: Documentation for external compactions

Posted by GitBox <gi...@apache.org>.
dlmarion commented on a change in pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282#discussion_r633713223



##########
File path: _docs-2/administration/compaction.md
##########
@@ -62,6 +68,151 @@ For more information see the javadoc for {% jlink org.apache.accumulo.core.spi.c
 
 The names of the compaction services and executors are used for logging and metrics.
 
+## External Compactions
+
+In Accumulo 2.1 we introduced a new optional feature that allows compactions to run
+outside of the Tablet Server.  External compactions introduces two new server processes
+in an Accumulo deployment:
+
+  * *Compactor*: Accumulo process that runs external compactions and is started with the name of a queue for which it will perform compactions.  In a typical deployment there will be many of these processes running, some for queue A, queue B, etc.  This process will only run a single compaction at a time and will communicate with the Compaction Coordinator to get a compaction job and report its status.
+
+  * *Compaction Coordinator*: a process that manages the compaction queues for all external compactions in the system and assigns compaction tasks to Compactors. In a typical deployment there will be one instance of this process in use at a time with a backup process waiting to become primary (much like the primary and secondary manager processes). This process communicates with the TabletServers to get external compaction job information and report back their status. 
+
+### Configuration
+
+Configuration for external compactions is very similar to the internal compaction example above.
+In the example below we create a Compaction Service `cs1` and configure it with an externalQueue
+named `DCQ1`. We then define the Compaction Dispatcher on table `testTable` and configure the
+table to use the `cs1` Compaction Service for planning and executing compactions.
+
+```
+config -s tserver.compaction.major.service.cs1.planner=org.apache.accumulo.core.spi.compaction.DefaultCompactionPlanner
+config -s 'tserver.compaction.major.service.cs1.planner.opts.executors=[{"name":"all","externalQueue":"DCQ1"}]'
+config -t testTable -s table.compaction.dispatcher=org.apache.accumulo.core.spi.compaction.SimpleCompactionDispatcher
+config -t testTable -s table.compaction.dispatcher.opts.service=cs1
+```
+
+### Overview
+
+The CompactionCoordinator is responsible for managing the global external compaction work queue. For each external compaction queue, the tablet server will maintain an in memory priority queue of the tablets loaded on it that require external compactions. The coordinator polls all tservers to get summary information about their external compaction queues to combine the summary information to determine which tablet server to contact next to get work.  The coordinator does not maintain per tablet information, it only maintains enough information to allow it know which tablet server to contact next for a given queue.  The tablet server will then know what specific tablet in that queue needs to compact.

Review comment:
       Resolved in bf31edfa9452de80f10ea3c9171c70451237f956




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-website] dlmarion commented on pull request #282: Documentation for external compactions

Posted by GitBox <gi...@apache.org>.
dlmarion commented on pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282#issuecomment-843206626


   FWIW you can mix internal and external compactions, using something like:
   ```
   config -s 'tserver.compaction.major.service.cs1.planner.opts.executors=[{"name":"small","maxSize":"16M","numThreads":8},{"name":"medium","maxSize":"128M","numThreads":4},{"name":"large","externalQueue":"LargeQ"}]'
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-website] milleruntime commented on pull request #282: Documentation for external compactions

Posted by GitBox <gi...@apache.org>.
milleruntime commented on pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282#issuecomment-842422029


   This could definitely be follow on work but I was thinking it would be nice to update the design diagrams to include the external stuff: https://github.com/apache/accumulo/blob/main/core/src/main/java/org/apache/accumulo/core/spi/compaction/doc-files/compaction-spi-design.png


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-website] dlmarion merged pull request #282: Documentation for external compactions

Posted by GitBox <gi...@apache.org>.
dlmarion merged pull request #282:
URL: https://github.com/apache/accumulo-website/pull/282


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org