You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2021/04/13 17:42:29 UTC

[GitHub] [accumulo] Manno15 opened a new issue #2016: Github QA occasionally hangs while running unit-tests

Manno15 opened a new issue #2016:
URL: https://github.com/apache/accumulo/issues/2016


   **Describe the bug**
   Github QA occasionally hangs while running unit tests and eventually will cancel after the allotted 60 minutes have passed. Typically hangs around the test `MiniAccumuloClusterImplTest`.  So far, this has not been reproduced locally and seems to be centered around a possible resource issue Github QA might run into. 
   
   **Versions (OS, Maven, Java, and others, as appropriate):**
    - Github Actions
    - Affected version(s) of this project: 2.1.0-SNAPSHOT though possibly unrelated. 
   
   **To Reproduce**
   Steps to reproduce the behavior (or a link to an example repository that reproduces the problem):
   1. Run GitHub actions on a branch until it either completes or fails. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii edited a comment on issue #2016: Github QA occasionally hangs while running unit-tests

Posted by GitBox <gi...@apache.org>.
ctubbsii edited a comment on issue #2016:
URL: https://github.com/apache/accumulo/issues/2016#issuecomment-819793131


   Okay, so fixing the timeout works. I was able to get the logs. It looks like services are starting up okay, but cannot talk to each other. The services register themselves using the local host name determined by using reverse DNS on the local IP address. When services are reached on localhost, everything works fine (e.g. services can talk to zookeeper on `localhost:33647` just fine). I don't see any errors with sending to the tracer service, but do see it listening on an IP address (`[tracer.AsyncSpanReceiver] INFO : starting span receiver with hostname 10.1.0.83`) instead of resolving a hostname.
   
   Tservers and the master in 1.10 (the build I was testing) show that they are listening on hostname `fv-az95-160`, but when the master tries to talk to the tservers, it fails to connect and times out:
   
   ```
   2021-04-14T18:10:40,775 [manager.Manager] INFO : New servers: [fv-az95-160:45343[100000b12f00006], fv-az95-160:36911[100000b12f00002]]
   2021-04-14T18:10:40,794 [manager.EventCoordinator] INFO : There are now 2 tablet servers            
   2021-04-14T18:10:40,803 [manager.Manager] INFO : tserver availability check disabled, continuing with-2 servers. To enable, set manager.startup.tserver.avail.min.count
   2021-04-14T18:10:40,956 [server.ServerUtil] WARN : System swappiness setting is greater than ten (60) which can cause time-sensitive operations to be delayed. Accumulo is time sensitive because it needs to maintain distributed lock agreement.
   2021-04-14T18:10:40,980 [manager.Manager] INFO : Setting manager lock data to fv-az95-160:35861     
   2021-04-14T18:10:41,040 [metrics.ManagerMetricsFactory] INFO : Registered replication metrics module   
   2021-04-14T18:10:41,061 [metrics.ManagerMetricsFactory] INFO : Registered FATE metrics module       
   2021-04-14T18:10:41,061 [manager.Manager] INFO : All metrics modules registered                        
   2021-04-14T18:10:41,330 [balancer.TableLoadBalancer] INFO : Loaded class org.apache.accumulo.core.spi.balancer.SimpleLoadBalancer for table +r
   2021-04-14T18:10:41,331 [manager.Manager] INFO : Assigning 1 tablets                                
   2021-04-14T18:11:20,829 [rpc.ThriftUtil] WARN : Failed to open transport to fv-az95-160:36911          
   2021-04-14T18:11:20,830 [rpc.ThriftUtil] WARN : Failed to open transport to fv-az95-160:45343          
   2021-04-14T18:11:20,830 [manager.Manager] ERROR: unable to get tablet server status fv-az95-160:36911[100000b12f00002] org.apache.thrift.transport.TTransportException: java.nio.channels.ClosedByInterruptException
   ```
   
   There is an additional stack trace further along, but it doesn't have any additional information, just that there was a timeout trying to connect to the tserver.
   
   So, either there is a problem with DNS/rDNS mapping between the hostname and IP address of the runner, or there is some other security / firewall policy preventing services from talking on the non-localhost IP address.
   
   This is clearly the result of some change in GitHub Actions runners, and not in our code, since it also affects minicluster in 1.10.
   
   The most likely change I can think of that could have caused this is the switch of `ubuntu-latest` from mapping to `ubuntu-18.04` to `ubuntu-20.04`. However, I don't have an Ubuntu instance to experiment with at the moment, so this is where I'm stuck for now.
   
   There's a few options forward, if it is an issue with Ubuntu 20.04:
   1. ~Force using ubuntu-18.04 instead of ubuntu-latest~ didn't work, test still hung and timed out with the same errors connecting
   2. ~Disable firewalld (or other firewall, if it's running on Ubuntu 20.04 runners)~ didn't work, the firewall is already inactive
   3. ~Add firewall rules (if the problem is the firewall)~ firewall is already inactive
   4. Fix rDNS name lookup for the local IP address by adding a hosts entry to /etc/hosts or doing something in /etc/nsswitch.conf to use the myhostname local name service rather than DNS and hostnamectl (if the problem is DNS)
   5. Force minicluster services to use localhost
   6. Get GitHub to fix it, if it's a problem with the ubuntu-20.04 image
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on issue #2016: Github QA occasionally hangs while running unit-tests

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #2016:
URL: https://github.com/apache/accumulo/issues/2016#issuecomment-818994661


   I just saw this happen in the testing repo too, which may indicate it has nothing to do with recent changes to the main repo.
   https://github.com/apache/accumulo-testing/pull/138/checks?check_run_id=2336663420


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on issue #2016: Github QA occasionally hangs while running unit-tests

Posted by GitBox <gi...@apache.org>.
Manno15 commented on issue #2016:
URL: https://github.com/apache/accumulo/issues/2016#issuecomment-820556363


   According to https://github.blog/changelog/2020-10-29-github-actions-ubuntu-latest-workflows-will-use-ubuntu-20-04/. The change to 20.04 should have happened a few months ago (unless it took them this long to roll it out). I do not see any similar issue in their virtual-environments repository. If we are confident that it exists on their end, we can create a ticket there. 
    
    It also says 18.04 is still supported so we test against that as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #2016: Github QA occasionally hangs while running unit-tests

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2016:
URL: https://github.com/apache/accumulo/issues/2016#issuecomment-820782637


   I filed a bug report with GitHub support.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] EdColeman commented on issue #2016: Github QA occasionally hangs while running unit-tests

Posted by GitBox <gi...@apache.org>.
EdColeman commented on issue #2016:
URL: https://github.com/apache/accumulo/issues/2016#issuecomment-819816644


   If it helps, locally I run Ubuntu 20.04.2 LTS (Focal Fossa) and have not seen this occur.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] asfgit closed issue #2016: Github QA occasionally hangs while running unit-tests

Posted by GitBox <gi...@apache.org>.
asfgit closed issue #2016:
URL: https://github.com/apache/accumulo/issues/2016


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #2016: Github QA occasionally hangs while running unit-tests

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2016:
URL: https://github.com/apache/accumulo/issues/2016#issuecomment-820773438


   Okay, so I've spent a few hours looking into this today, and I think it's a bug in GitHub Actions DNS servers.
   
   If you create a GitHub Actions job that simply prints the output of `hostname -i` and `hostname -I`, you can clearly see there's a problem. They don't return the same IP address. The former uses DNS to show the IP address based on the current machine's hostname. The latter shows the IP addresses for the non-loopback network interfaces on the current machine. The IP address from the DNS lookup does not match any of the network interfaces for this machine. So, clearly, something is wrong. You can confirm the local network interfaces with `ifconfig` or `ip -4 -br addr`.
   
   Since Accumulo services are listening on `0.0.0.0`, and using `hostname` to advertise themselves in ZooKeeper, and since `hostname` results in an IP address resolution that doesn't match any of the IP addresses for any of the local network interfaces on the current machine, of course it can't talk to the Accumulo process.... because it's using the wrong IP address!
   
   I tried a few different methods to force the name lookup to resolve correctly, including using `myhostname` entry in `/etc/nsswitch.conf` prior to `files dns`, I tried dropping `dns` from that as well, and I tried appending an entry in `/etc/hosts` with the correct IP address for the current host. All of these are hacks, and nothing seemed to work quite right. Another workaround would be to force minicluster services to listen on `localhost`/`127.0.0.1`. This isn't a great solution, but it might resolve the problems with misconfigured DNS in GitHub Actions runners.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #2016: Github QA occasionally hangs while running unit-tests

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2016:
URL: https://github.com/apache/accumulo/issues/2016#issuecomment-826936225


   Upstream issue is now at actions/virtual-environments#3185
   
   They have codified the recommended workaround (to update `/etc/hosts` with the eth0 IP address for the current runner's `hostname`). Since the workaround that I made first checks to see if there's already an entry in `/etc/hosts` before it adds one, our workaround is effectively deactivated for now.
   
   If the upstream workaround is removed, then our workaround will automatically apply. If they fix the DNS records so that a workaround is no longer needed at all, then we can revisit this (or change the way we check to see if a fix is needed).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #2016: Github QA occasionally hangs while running unit-tests

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2016:
URL: https://github.com/apache/accumulo/issues/2016#issuecomment-820964113


   My previous attempt to fix this had a typo in the /etc/hosts filename. Correcting that typo should fix the issue, even though it's a hack. See PR #2024 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #2016: Github QA occasionally hangs while running unit-tests

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2016:
URL: https://github.com/apache/accumulo/issues/2016#issuecomment-820948839


   [Here's another report](https://lists.apache.org/thread.html/r40b2da182952f21a3b94696e6f1bf8594e7ff16e1c5979fe8b2d51c8%40%3Cbuilds.apache.org%3E) of the same issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #2016: Github QA occasionally hangs while running unit-tests

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2016:
URL: https://github.com/apache/accumulo/issues/2016#issuecomment-819692920


   I noticed that in these cases, the logs aren't being uploaded. It looks like the timeout is set on entire jobs, so the upload-artifact steps never happen if the timeout is reached in running the Maven build. This can be addressed by moving the timeout-minutes to the steps, rather than under the job. That way, if any individual step times out, the other steps (like uploading log artifacts) still run.
   
   Also, the last thing done in a workflow before the runner shuts down is to kill orphaned processes. I noticed that there may be fewer processes running than expected. Here is an example from a recent timeout:
   
   ```
   2021-04-14T17:17:43.3208229Z Cleaning up orphan processes
   2021-04-14T17:17:43.3656311Z Terminate orphan process: pid (1580) (java)
   2021-04-14T17:17:43.3799021Z Terminate orphan process: pid (18229) (sh)
   2021-04-14T17:17:43.3833188Z Terminate orphan process: pid (18233) (java)
   2021-04-14T17:17:43.3882297Z Terminate orphan process: pid (18253) (java)
   2021-04-14T17:17:43.3938861Z Terminate orphan process: pid (18361) (java)
   2021-04-14T17:17:43.4044764Z Terminate orphan process: pid (18362) (java)
   2021-04-14T17:17:43.4095156Z Terminate orphan process: pid (18475) (java)
   2021-04-14T17:17:43.4151569Z Terminate orphan process: pid (18476) (java)
   ```
   
   I'm not sure which java processes are still running at the end of the job, but it looks like mini isn't starting or shutting down correctly when everything times out and is terminated. It would be useful to get more information in the output about what these processes are, in order to determine what is causing the problem. A simple "ps aux | grep jav[a]" or similar might help provide more insight into which processes are left running (and perhaps which aren't running that should have been).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #2016: Github QA occasionally hangs while running unit-tests

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2016:
URL: https://github.com/apache/accumulo/issues/2016#issuecomment-819793131


   Okay, so fixing the timeout works. I was able to get the logs. It looks like services are starting up okay, but cannot talk to each other. The services register themselves using the local host name determined by using reverse DNS on the local IP address. When services are reached on localhost, everything works fine (e.g. services can talk to zookeeper on `localhost:33647` just fine). I don't see any errors with sending to the tracer service, but do see it listening on an IP address (`[tracer.AsyncSpanReceiver] INFO : starting span receiver with hostname 10.1.0.83`) instead of resolving a hostname.
   
   Tservers and the master in 1.10 (the build I was testing) show that they are listening on hostname `fv-az95-160`, but when the master tries to talk to the tservers, it fails to connect and times out:
   
   ```
   2021-04-14T18:10:40,775 [manager.Manager] INFO : New servers: [fv-az95-160:45343[100000b12f00006], fv-az95-160:36911[100000b12f00002]]
   2021-04-14T18:10:40,794 [manager.EventCoordinator] INFO : There are now 2 tablet servers            
   2021-04-14T18:10:40,803 [manager.Manager] INFO : tserver availability check disabled, continuing with-2 servers. To enable, set manager.startup.tserver.avail.min.count
   2021-04-14T18:10:40,956 [server.ServerUtil] WARN : System swappiness setting is greater than ten (60) which can cause time-sensitive operations to be delayed. Accumulo is time sensitive because it needs to maintain distributed lock agreement.
   2021-04-14T18:10:40,980 [manager.Manager] INFO : Setting manager lock data to fv-az95-160:35861     
   2021-04-14T18:10:41,040 [metrics.ManagerMetricsFactory] INFO : Registered replication metrics module   
   2021-04-14T18:10:41,061 [metrics.ManagerMetricsFactory] INFO : Registered FATE metrics module       
   2021-04-14T18:10:41,061 [manager.Manager] INFO : All metrics modules registered                        
   2021-04-14T18:10:41,330 [balancer.TableLoadBalancer] INFO : Loaded class org.apache.accumulo.core.spi.balancer.SimpleLoadBalancer for table +r
   2021-04-14T18:10:41,331 [manager.Manager] INFO : Assigning 1 tablets                                
   2021-04-14T18:11:20,829 [rpc.ThriftUtil] WARN : Failed to open transport to fv-az95-160:36911          
   2021-04-14T18:11:20,830 [rpc.ThriftUtil] WARN : Failed to open transport to fv-az95-160:45343          
   2021-04-14T18:11:20,830 [manager.Manager] ERROR: unable to get tablet server status fv-az95-160:36911[100000b12f00002] org.apache.thrift.transport.TTransportException: java.nio.channels.ClosedByInterruptException
   ```
   
   There is an additional stack trace further along, but it doesn't have any additional information, just that there was a timeout trying to connect to the tserver.
   
   So, either there is a problem with DNS/rDNS mapping between the hostname and IP address of the runner, or there is some other security / firewall policy preventing services from talking on the non-localhost IP address.
   
   This is clearly the result of some change in GitHub Actions runners, and not in our code, since it also affects minicluster in 1.10.
   
   The most likely change I can think of that could have caused this is the switch of `ubuntu-latest` from mapping to `ubuntu-18.04` to `ubuntu-20.04`. However, I don't have an Ubuntu instance to experiment with at the moment, so this is where I'm stuck for now.
   
   There's a few options forward, if it is an issue with Ubuntu 20.04:
   1. Force using ubuntu-18.04 instead of ubuntu-latest
   2. Disable firewalld (or other firewall, if it's running on Ubuntu 20.04 runners)
   3. Add firewall rules (if the problem is the firewall)
   4. Fix rDNS name lookup for the local IP address by adding a hosts entry to /etc/hosts or doing something in /etc/nsswitch.conf to use the myhostname local name service rather than DNS and hostnamectl (if the problem is DNS)
   5. Force minicluster services to use localhost
   6. Get GitHub to fix it, if it's a problem with the ubuntu-20.04 image
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] mjwall commented on issue #2016: Github QA occasionally hangs while running unit-tests

Posted by GitBox <gi...@apache.org>.
mjwall commented on issue #2016:
URL: https://github.com/apache/accumulo/issues/2016#issuecomment-820859984


   > 
   > 
   > Okay, so I've spent a few hours looking into this today, and I think it's a bug in GitHub Actions DNS servers.
   > 
   > If you create a GitHub Actions job that simply prints the output of `hostname -i` and `hostname -I`, you can clearly see there's a problem. They don't return the same IP address. The former uses DNS to show the IP address based on the current machine's hostname. The latter shows the IP addresses for the non-loopback network interfaces on the current machine. The IP address from the DNS lookup does not match any of the network interfaces for this machine. So, clearly, something is wrong. You can confirm the local network interfaces with `ifconfig` or `ip -4 -br addr`.
   > 
   > Since Accumulo services are listening on `0.0.0.0`, and using `hostname` to advertise themselves in ZooKeeper, and since `hostname` results in an IP address resolution that doesn't match any of the IP addresses for any of the local network interfaces on the current machine, of course it can't talk to the Accumulo process.... because it's using the wrong IP address!
   > 
   > I tried a few different methods to force the name lookup to resolve correctly, including using `myhostname` entry in `/etc/nsswitch.conf` prior to `files dns`, I tried dropping `dns` from that as well, and I tried appending an entry in `/etc/hosts` with the correct IP address for the current host. All of these are hacks, and nothing seemed to work quite right. Another workaround would be to force minicluster services to listen on `localhost`/`127.0.0.1`. This isn't a great solution, but it might resolve the problems with misconfigured DNS in GitHub Actions runners.
   
   Nice detective work @ctubbsii 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii edited a comment on issue #2016: Github QA occasionally hangs while running unit-tests

Posted by GitBox <gi...@apache.org>.
ctubbsii edited a comment on issue #2016:
URL: https://github.com/apache/accumulo/issues/2016#issuecomment-819793131


   Okay, so fixing the timeout works. I was able to get the logs. It looks like services are starting up okay, but cannot talk to each other. The services register themselves using the local host name determined by using reverse DNS on the local IP address. When services are reached on localhost, everything works fine (e.g. services can talk to zookeeper on `localhost:33647` just fine). I don't see any errors with sending to the tracer service, but do see it listening on an IP address (`[tracer.AsyncSpanReceiver] INFO : starting span receiver with hostname 10.1.0.83`) instead of resolving a hostname.
   
   Tservers and the master in 1.10 (the build I was testing) show that they are listening on hostname `fv-az95-160`, but when the master tries to talk to the tservers, it fails to connect and times out:
   
   ```
   2021-04-14T18:10:40,775 [manager.Manager] INFO : New servers: [fv-az95-160:45343[100000b12f00006], fv-az95-160:36911[100000b12f00002]]
   2021-04-14T18:10:40,794 [manager.EventCoordinator] INFO : There are now 2 tablet servers            
   2021-04-14T18:10:40,803 [manager.Manager] INFO : tserver availability check disabled, continuing with-2 servers. To enable, set manager.startup.tserver.avail.min.count
   2021-04-14T18:10:40,956 [server.ServerUtil] WARN : System swappiness setting is greater than ten (60) which can cause time-sensitive operations to be delayed. Accumulo is time sensitive because it needs to maintain distributed lock agreement.
   2021-04-14T18:10:40,980 [manager.Manager] INFO : Setting manager lock data to fv-az95-160:35861     
   2021-04-14T18:10:41,040 [metrics.ManagerMetricsFactory] INFO : Registered replication metrics module   
   2021-04-14T18:10:41,061 [metrics.ManagerMetricsFactory] INFO : Registered FATE metrics module       
   2021-04-14T18:10:41,061 [manager.Manager] INFO : All metrics modules registered                        
   2021-04-14T18:10:41,330 [balancer.TableLoadBalancer] INFO : Loaded class org.apache.accumulo.core.spi.balancer.SimpleLoadBalancer for table +r
   2021-04-14T18:10:41,331 [manager.Manager] INFO : Assigning 1 tablets                                
   2021-04-14T18:11:20,829 [rpc.ThriftUtil] WARN : Failed to open transport to fv-az95-160:36911          
   2021-04-14T18:11:20,830 [rpc.ThriftUtil] WARN : Failed to open transport to fv-az95-160:45343          
   2021-04-14T18:11:20,830 [manager.Manager] ERROR: unable to get tablet server status fv-az95-160:36911[100000b12f00002] org.apache.thrift.transport.TTransportException: java.nio.channels.ClosedByInterruptException
   ```
   
   There is an additional stack trace further along, but it doesn't have any additional information, just that there was a timeout trying to connect to the tserver.
   
   So, either there is a problem with DNS/rDNS mapping between the hostname and IP address of the runner, or there is some other security / firewall policy preventing services from talking on the non-localhost IP address.
   
   This is clearly the result of some change in GitHub Actions runners, and not in our code, since it also affects minicluster in 1.10.
   
   The most likely change I can think of that could have caused this is the switch of `ubuntu-latest` from mapping to `ubuntu-18.04` to `ubuntu-20.04`. However, I don't have an Ubuntu instance to experiment with at the moment, so this is where I'm stuck for now.
   
   There's a few options forward, if it is an issue with Ubuntu 20.04:
   1. ~Force using ubuntu-18.04 instead of ubuntu-latest~ didn't work, test still hung and timed out with the same errors connecting
   2. ~Disable firewalld (or other firewall, if it's running on Ubuntu 20.04 runners)~ didn't work, the firewall is already inactive
   3. ~Add firewall rules (if the problem is the firewall)~ firewall is already inactive
   4. Fix rDNS name lookup for the local IP address by adding a hosts entry to /etc/hosts or doing something in /etc/nsswitch.conf to use the myhostname local name service rather than DNS and hostnamectl (if the problem is DNS)
   5. Force minicluster services to use localhost
   6. Get GitHub to fix it, if it's a problem with runner
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #2016: Github QA occasionally hangs while running unit-tests

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2016:
URL: https://github.com/apache/accumulo/issues/2016#issuecomment-819720780


   I updated the timeout settings, but have not yet attempted to add any logging about the still running processes. I'm hoping the logs will have some insights, now that they *should* be uploaded on timeouts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org