You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2021/06/02 19:18:03 UTC

[GitHub] [accumulo] keith-turner opened a new issue #2138: Explore supporting external compaction processes in accumulo-cluster script

keith-turner opened a new issue #2138:
URL: https://github.com/apache/accumulo/issues/2138


   The accumulo-cluster script could be modified to support starting coordinator and compactor processes for external compactions if the corresponding files exists.  Not sure if this is a good idea in general, but thought it was worth considering. Trying to determine if this is something worth pursuing, came up with the following list of pros and cons.  What other things are there to consider?
   
   Pros
   - Makes it easier to start these processes on a cluster for testing.
   
   Cons
   - More to maintain in the accumulo-cluster script, this may only be an issue if there is specialization for compactors.
   - Compactor processes run against a queue.  This may not fit well w/ the accumulo cluster script.  Would need to figure out an elegant way to do this.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-867028267


   It looks like to install `shyaml` and `pyyaml` you need to first install `python` and `pip`. In some environments you may need to specially configure `pip`. `yq` is a single binary.  It looks like `rpm` is the only format in which it's not packaged, although I found easy instructions for doing it. It could be installed on CentOS / RHEL via Snap: https://snapcraft.io/install/yq/rhel
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-866903904


   Would you expect that `yamllint` and `yq` be prerequisites for installing Accumulo so that we could use them to validate and parse the yaml file?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner edited a comment on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
keith-turner edited a comment on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-875745985






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner edited a comment on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
keith-turner edited a comment on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-854096865






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion edited a comment on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
dlmarion edited a comment on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-866979795


   Install `yamllint` and `yq`:
   
   ```
   sudo yum install yamllint
   wget https://github.com/mikefarah/yq/releases/download/v4.9.6/yq_linux_amd64 -O yq
   ```
   
   So, for this file:
   ```
   ---
   managers:
     - localhost1
     - localhost2
   
   monitors:
     - localhost1
     - localhost2
   
   trace:
     - localhost
   
   gc:
     - localhost
   
   tservers:
     - localhost1
     - localhost2
     - localhost3
     - localhost4
   
   # compaction:
   #   coordinators:
   #     - localhost1
   #     - localhost2
   #   compactors:
   #     - queue: q1
   #       hosts:
   #         - localhost1
   #         - localhost2
   #     - queue: q2
   #       hosts:
   #         - localhost1
   #         - localhost2
   
   ```
   
   These commands  work:
   
   ```
   yamllint -s hosts.yaml      # 0 == success, anything else is error and malformed
   managers=$(yq eval '.managers | join(",")' hosts.yaml) 
   monitors=$(yq eval '.monitors | join(",")' hosts.yaml)
   gc=$(yq eval '.gc | join(",")' hosts.yaml)
   tservers=$(yq eval '.tservers | join(",")' hosts.yaml)
   yq eval '.compaction' hosts.yaml     #  null --> means its commented out
   coordinators=$(yq eval '.compaction.coordinators | join(",")' hosts.yaml)
   queues=$(yq eval '[.compaction.compactors.[].queue] | join(",")' hosts.yaml)
   queue1_hosts=$(yq eval '.compaction.compactors.[] | select (.queue == "q1") | .hosts | join(",")' hosts.yaml)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner edited a comment on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
keith-turner edited a comment on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-854096865


   > I would like to see the hosts files consolidated into a single accumulo-cluster.conf
   
   That sounds interesting.  Any thoughts on the structure and how compactor+queue info might fit into that? For compactors I was thinking of the following possibilities, had not considered a single file.
   
   1. A dir name compactors with files under it for each queue.  The name of the file would be queue name and the contents of the file would be host to start compactors.
   2. A file named compactors where the contents are lines like : `<host>,<queue>`.  This would repeat information alot and seems error prone.
   3. A file named compactors where contents has sections.  The section is the queue name and the contents are host.  Like 
   ```
   [Q1]
   host1
   host2
   
   [Q2]
   host3
   host4
   ```
   
   (edited comment to view format of option 3 better, and to number the options, to make it easier to reference)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-870551811


   > If you prefer, we could also just do JSON instead of YAML, only because there's a ton of parsers already available.
   
   Json would work too. Unfortunately json does not allow for comments in files. So you would need two different types of files depending on whether or not you are using external compactions. The examples below are with and without external compactions.
   
   ```
   {
     "managers": [
       "localhost1",
       "localhost2"
       ],
     "monitors": [
       "localhost1",
       "localhost2"
       ],
     "tracers": [
       "localhost1"
       ],
     "gc": [
       "localhost1"
       ],
     "tservers": [
       "localhost1",
       "localhost2",
       "localhost3",
       "localhost4"
       ],
     "compaction": {
       "coordinators": [
         "localhost1",
         "localhost2"
         ],
       "compactors": [
         {
           "queue": "q1",
           "hosts": [
             "localhost1",
             "localhost2"
             ]
         },
         {
           "queue": "q2",
           "hosts": [
             "localhost1",
             "localhost2"
             ]
         }
       ]
     }
   }
   ```
   
   ```
   {
     "managers": [
       "localhost1",
       "localhost2"
       ],
     "monitors": [
       "localhost1",
       "localhost2"
       ],
     "tracers": [
       "localhost1"
       ],
     "gc": [
       "localhost1"
       ],
     "tservers": [
       "localhost1",
       "localhost2",
       "localhost3",
       "localhost4"
       ]
   }
   ```
   
    `yq` is an extension of `jq` [link](https://stedolan.github.io/jq/) and I believe that `jq` packages are more widely available ( see https://stedolan.github.io/jq/download/).  The following commands using `jq` work (very similar to the `yq` examples above):
    
    ```
    managers=$(jq '.managers | join(",")' hosts.json)
   monitors=$(jq '.monitors | join(",")' hosts.json)
   gc=$(jq '.gc | join(",")' hosts.json)
   tracers=$(jq '.tracers | join(",")' hosts.json)
   tserver=$(jq '.tservers | join(",")' hosts.json)
   jq '.compaction' hosts.json  # null means it does not exist
   coordinators=$(jq '.compaction.coordinators | join(",")' hosts-compaction.json)
   queues=$(jq '.compaction.compactors | [.[].queue] | join(",")' hosts-compaction.json)
   queue1_hosts=$(jq '.compaction.compactors | .[] | select(.queue == "q1") | .hosts | join(",")' hosts-compaction.json)
    ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-866979795


   Install `yamllint` and `yq`:
   
   ```
   sudo yum install yamllint
   wget https://github.com/mikefarah/yq/releases/download/v4.9.6/yq_linux_amd64 -O yq
   ```
   
   So, for this file:
   ```
   ---
   managers:
     - localhost1
     - localhost2
   
   monitors:
     - localhost1
     - localhost2
   
   trace:
     - localhost
   
   gc:
     - localhost
   
   tservers:
     - localhost1
     - localhost2
     - localhost3
     - localhost4
   
   # compaction:
   #   coordinators:
   #     - localhost1
   #     - localhost2
   #   compactors:
   #     - queue: q1
   #       hosts:
   #         - localhost1
   #         - localhost2
   #     - queue: q2
   #       hosts:
   #         - localhost1
   #         - localhost2
   
   ```
   
   These commands  work:
   
   ```
   yamllint -s hosts.yaml      # 0 == success, anything else is error and malformed
   managers=$(yq eval '.managers | join(",")' hosts.yaml) 
   monitors=$(yq eval '.monitors | join(",")' hosts.yaml)
   gc=$(yq eval '.gc | join(",")' hosts.yaml)
   tservers=$(yq eval '.tservers | join(",")' hosts.yaml)
   yq eval '.compaction' hosts.yaml     #  null --> means its commented out
   yq eval '.compaction.coordinators | join(",")' hosts.yaml
   queues=$(yq eval '[.compaction.compactors.[].queue] | join(",")' hosts.yaml)
   queue1_hosts=$(yq eval '.compaction.compactors.[] | select (.queue == "q1") | .hosts | join(",")' hosts.yaml)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-870957594


   JSON is uglier, but more ubiquitous. Ultimately, I think I'd prefer YAML, but if there are things we can do to make it depend on less (such as having a bit of java code using one of our existing dependencies) to make the bar lower, that would be nice. If not, I think I'd still rather users get `yq` or `shyaml` themselves than force the ugliness of JSON on them.
   
   Could also do simple properties files as well, but the schema would be a little bit frustrating. Commons-configuration2 could easily manage it, though... but if we're going that route, it'd be better to just use YAML, since commons-configuration2 can handle that too.
   
   ```
   manager.hosts=managerhost1,managerhost2
   ...
   compaction.coordinator.1=server1
   compaction.coordinator.2=server2
   compaction.queues.q1.compactors=server3,server4
   compaction.queues.q2.compactors=server5,server6
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner edited a comment on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
keith-turner edited a comment on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-854096865


   > I would like to see the hosts files consolidated into a single accumulo-cluster.conf
   
   That sounds interesting.  Any thoughts on the structure and how compactor+queue info might fit into that? For compactors I was thinking of the following possibilities, had not considered a single file.
   
   1. A dir name compactors with files under it for each queue.  The name of the file would be queue name and the contents of the file would be host to start compactors.
   2. A file named compactors where the contents are lines like : `<host>,<queue>`.  This would repeat information alot and seems error prone.
   3. A file named compactors where contents has sections.  The section is the queue name and the contents are host.  Like 
   ```
   [Q1]
   host1
   host2
   
   [Q2]
   host3
   host4
   ```
   
   (comment edited by @ctubbsii to view format of option 3 better, and to number the options, to make it easier to reference)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-875745985


   One approach to supporting yaml is that we write a Java program to parse the yaml and emit text that is easy to deal with in bash.  This avoid the dependency on `yq` or `shyaml` commands, while possibly introducing a yaml dep in java.  Then could do something like the following in the accumulo-cluster bash script.
   
   ```bash
   accumulo org.apache.accumulo.core.cluster.YamlExtract <cluster conf file>
   ```
   
   The java program could extract specific information based on arguments OR convert the yaml to CSV like the following that is easy to process in bash.
   
   ```
   tserver,host1
   tserver,host2
   tserver,host3
   manager,host4
   manager,host4
   gc,host5
   monitor,host6
   compactor,host1,Q1
   compactor,host2,Q2
   compactor,host3,Q2
   compactor,host3,Q2
   ```
   
   Not sure if its better to make the java program smart w/ lots of options or make it dumb and spit out something like CSV moving more logic to bash.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-867008292


   > Would you expect that `yamllint` and `yq` be prerequisites for installing Accumulo so that we could use them to validate and parse the yaml file?
   
   I've never used these. I'm not sure what we'd use to parse. However, I don't think it'd be unreasonable to add a yaml parser dependency for the `accumulo-cluster` script. (Note: I specified that it'd be fore the `accumulo-cluster` script, because I really think of that as a separate cluster-management utility that ships alongside of Accumulo, but is not *part of* Accumulo. It's optional, and users may wish to use a completely different cluster management mechanism. These are only included in the tarball because we've trained users to expect some kind of cluster management scripts ... aka 'start-all'... from early days. My preference is that they live in a separate repo, or live under a `contrib` folder. I mention this here, only to clarify the distinction between what is "Accumulo" and what is "accumulo-cluster", so as to keep the interface between the two separate and avoid tight coupling between the two).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-854096865


   > I would like to see the hosts files consolidated into a single accumulo-cluster.conf
   
   That sounds interesting.  Any thoughts on the structure and how compactor+queue info might fit into that? For compactors I was thinking of the following possibilities, had not considered a single file.
   
   - A dir name compactors with files under it for each queue.  The name of the file would be queue name and the contents of the file would be host to start compactors.
   - A file named compactors where the contents are lines like : `<host>,<queue>`.  This would repeat information alot and seems error prone.
   - A file named compactors where contents has sections.  The section is the queue name and the contents are host.  Like `[Q1]\nhost1\nhost2\n[Q2]\nhost3\nhost4`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-866971695


   I have not done anything w/ yaml using command line tools.  I wonder if there are tools that would make it possible to convert the existing files to yaml in a few lines of bash.  If so the script could detect the old files and just print an error message w/ the suggested commands to run to convert them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-854060993






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-854157351


   Of the three, option 3 is closest to what I was thinking. However, if it was done in a single file, it would strongly depend on the format for that file. If it were YAML, it might look something like:
   
   ```yaml
   managers:
    - host1
   
   gc:
    - host1
   
   tservers:
    - host2
    - host3
   
   compaction:
     coordinator: host4
     queues:
      -
        name: queue1
        workers: [ host5, host6 ]
      -
        name: queue2
        workers:
         - host7
         - host8
         - host9
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion closed issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
dlmarion closed issue #2138:
URL: https://github.com/apache/accumulo/issues/2138


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii edited a comment on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
ctubbsii edited a comment on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-875858238


   @keith-turner that's basically what I was thinking, since we probably have everything we need on the class path already in `lib/`. I wasn't sure which way to go either, but I think I prefer keeping the bash simpler by having arguments. Emitting lists of things as newline- or space-delimited makes it easier to loop over, if necessary, in bash. (But we'd want to avoid making it so smart that it's effectively another yq or shyaml implementation... we're not trying to write a generic utility.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-867268234


   If you do `sudo dnf install shyaml`, it automatically pulls in dependencies. You don't need to install pip at all, and since dnf/yum already depends on python, it will already be installed. That said, I already use snaps for a few things, so I didn't have a problem installing the snap for `yq`. So, whatever you think would be easiest for users. If you prefer, we could also just do JSON instead of YAML, only because there's a ton of parsers already available.
   
   We could also do something in Java. We already have commons-configuration2 on the classpath, and it supports YAML. We already call Java in the accumulo-cluster script for ZooZap. It wouldn't be hard to write something small that parses our config file for us and extracts the relevant bits.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-870551811


   > If you prefer, we could also just do JSON instead of YAML, only because there's a ton of parsers already available.
   
   Json would work too. Unfortunately json does not allow for comments in files. So you would need two different types of files depending on whether or not you are using external compactions. The examples below are with and without external compactions.
   
   ```
   {
     "managers": [
       "localhost1",
       "localhost2"
       ],
     "monitors": [
       "localhost1",
       "localhost2"
       ],
     "tracers": [
       "localhost1"
       ],
     "gc": [
       "localhost1"
       ],
     "tservers": [
       "localhost1",
       "localhost2",
       "localhost3",
       "localhost4"
       ],
     "compaction": {
       "coordinators": [
         "localhost1",
         "localhost2"
         ],
       "compactors": [
         {
           "queue": "q1",
           "hosts": [
             "localhost1",
             "localhost2"
             ]
         },
         {
           "queue": "q2",
           "hosts": [
             "localhost1",
             "localhost2"
             ]
         }
       ]
     }
   }
   ```
   
   ```
   {
     "managers": [
       "localhost1",
       "localhost2"
       ],
     "monitors": [
       "localhost1",
       "localhost2"
       ],
     "tracers": [
       "localhost1"
       ],
     "gc": [
       "localhost1"
       ],
     "tservers": [
       "localhost1",
       "localhost2",
       "localhost3",
       "localhost4"
       ]
   }
   ```
   
    `yq` is an extension of `jq` [link](https://stedolan.github.io/jq/) and I believe that `jq` packages are more widely available ( see https://stedolan.github.io/jq/download/).  The following commands using `jq` work (very similar to the `yq` examples above):
    
    ```
    managers=$(jq '.managers | join(",")' hosts.json)
   monitors=$(jq '.monitors | join(",")' hosts.json)
   gc=$(jq '.gc | join(",")' hosts.json)
   tracers=$(jq '.tracers | join(",")' hosts.json)
   tserver=$(jq '.tservers | join(",")' hosts.json)
   jq '.compaction' hosts.json  # null means it does not exist
   coordinators=$(jq '.compaction.coordinators | join(",")' hosts-compaction.json)
   queues=$(jq '.compaction.compactors | [.[].queue] | join(",")' hosts-compaction.json)
   queue1_hosts=$(jq '.compaction.compactors | .[] | select(.queue == "q1") | .hosts | join(",")' hosts-compaction.json)
    ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-878538662


   > @keith-turner that's basically what I was thinking, since we probably have everything we need on the class path already in lib/.
   
   I didn't see your earlier message I see it now.  It seems like doing the heavy lifting in Java is the best way to go, especially if we already had the needed deps on the classpath.
   
   > I think I prefer keeping the bash simpler by having arguments.
   
   I think I do too.  I also agree that we want to avoid creating something like yq or shyaml in Java.  The Java code and options should be tailored to the accumulo cluster file format.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner edited a comment on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
keith-turner edited a comment on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-875745985


   One approach to supporting yaml is that we write a Java program to parse the yaml and emit text that is easy to deal with in bash.  This avoids the dependency on `yq` or `shyaml` commands, while possibly introducing a yaml dep in java.  Then could do something like the following in the accumulo-cluster bash script.
   
   ```bash
   accumulo org.apache.accumulo.core.cluster.YamlExtract <cluster conf file>
   ```
   
   The java program could extract specific information based on arguments OR convert the yaml to CSV like the following that is easy to process in bash.
   
   ```
   tserver,host1
   tserver,host2
   tserver,host3
   manager,host4
   manager,host7
   gc,host5
   monitor,host6
   compactor,host1,Q1
   compactor,host2,Q2
   compactor,host3,Q2
   compactor,host3,Q2
   ```
   
   Not sure if its better to make the java program smart w/ lots of options or make it dumb and spit out something like CSV moving more logic to bash.  Making the java program smarter would look like the following, where the list of tservers is extracted from the yaml file.
   
   ```bash
   $ accumulo org.apache.accumulo.core.cluster.YamlExtract --server-type tserver <cluster conf file>
   host1
   host2
   host3
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-867015784


   I am concerned about adding `yq` as a dependency, since it's not conveniently packaged for RPM-based distros. I wonder if `shyaml` might be better, which wraps `pyyaml`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-875858238


   @keith-turner that's basically what I was thinking, since we probably have everything we need on the class path already in `lib/`. I wasn't sure which way to go either, but I think I prefer keeping the bash simpler by having arguments. Emitting lists of things as newline- or space-delimited makes it easier to loop over, if necessary, in bash.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-902866074


   @keith-turner @ctubbsii  - take a look at the draft PR, let me know if this is what you had in mind. I have not tested it yet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion edited a comment on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
dlmarion edited a comment on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-866979795


   Install `yamllint` and `yq`:
   
   ```
   sudo yum install yamllint
   wget https://github.com/mikefarah/yq/releases/download/v4.9.6/yq_linux_amd64 -O yq
   ```
   
   So, for this file:
   ```
   ---
   managers:
     - localhost1
     - localhost2
   
   monitors:
     - localhost1
     - localhost2
   
   tracer: # EDIT by @ctubbsii : this should be 'tracer' not 'trace', since this refers to the accumulo-tracer service.
     - localhost
   
   gc:
     - localhost
   
   tservers:
     - localhost1
     - localhost2
     - localhost3
     - localhost4
   
   # compaction:
   #   coordinators:
   #     - localhost1
   #     - localhost2
   #   compactors:
   #     - queue: q1
   #       hosts:
   #         - localhost1
   #         - localhost2
   #     - queue: q2
   #       hosts:
   #         - localhost1
   #         - localhost2
   
   ```
   
   These commands  work:
   
   ```
   yamllint -s hosts.yaml      # 0 == success, anything else is error and malformed
   managers=$(yq eval '.managers | join(",")' hosts.yaml) 
   monitors=$(yq eval '.monitors | join(",")' hosts.yaml)
   gc=$(yq eval '.gc | join(",")' hosts.yaml)
   tservers=$(yq eval '.tservers | join(",")' hosts.yaml)
   yq eval '.compaction' hosts.yaml     #  null --> means its commented out
   coordinators=$(yq eval '.compaction.coordinators | join(",")' hosts.yaml)
   queues=$(yq eval '[.compaction.compactors.[].queue] | join(",")' hosts.yaml)
   queue1_hosts=$(yq eval '.compaction.compactors.[] | select (.queue == "q1") | .hosts | join(",")' hosts.yaml)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-854060993


   Rather than create new hosts files, I would like to see the hosts files consolidated into a single accumulo-cluster.conf file which includes the hosts for all the services the accumulo-cluster script manages (there may be an existing ticket for this... I didn't check before writing this). If host configuration were streamlined into a single config file, it would be easier to specify the startup of additional services, perhaps even with separate config for each host.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on issue #2138: Explore supporting external compaction processes in accumulo-cluster script

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2138:
URL: https://github.com/apache/accumulo/issues/2138#issuecomment-854096865


   > I would like to see the hosts files consolidated into a single accumulo-cluster.conf
   
   That sounds interesting.  Any thoughts on the structure and how compactor+queue info might fit into that? For compactors I was thinking of the following possibilities, had not considered a single file.
   
   - A dir name compactors with files under it for each queue.  The name of the file would be queue name and the contents of the file would be host to start compactors.
   - A file named compactors where the contents are lines like : `<host>,<queue>`.  This would repeat information alot and seems error prone.
   - A file named compactors where contents has sections.  The section is the queue name and the contents are host.  Like `[Q1]\nhost1\nhost2\n[Q2]\nhost3\nhost4`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org