You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "ivakegg (via GitHub)" <gi...@apache.org> on 2023/05/04 12:49:26 UTC

[GitHub] [accumulo] ivakegg opened a new issue, #3371: tservers falling over because a scan failed

ivakegg opened a new issue, #3371:
URL: https://github.com/apache/accumulo/issues/3371

   Scans were being performed on a table and the vfs class loader was having some threading issue that was causing the scan to fail.  The threading issues I am handling with VFS-836 (in jira).  The problem with accumulo is that these failures (NoClassDefFoundError) are causing the tserver to be halted.  A user's scan should not cause the tserver to be halted.  It is perfectly conceivable that the iterator stack could be using classes that are not accessible by the tserver it is being run on if, for example, the table.context is specified incorrectly.  This happens periodically on our systems when we have various deployment issues.
   
   Accumulo 2.1.1-SNAPSHOT
   CentOS 7.3


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ivakegg commented on issue #3371: tservers falling over because a scan failed

Posted by "ivakegg (via GitHub)" <gi...@apache.org>.
ivakegg commented on issue #3371:
URL: https://github.com/apache/accumulo/issues/3371#issuecomment-1561224007

   Actually, looking at what the error was more closely, it was actually not in the path of the VFS Class loader.  I am reopening this until I can dig further.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ctubbsii commented on issue #3371: tservers falling over because a scan failed

Posted by "ctubbsii (via GitHub)" <gi...@apache.org>.
ctubbsii commented on issue #3371:
URL: https://github.com/apache/accumulo/issues/3371#issuecomment-1535932841

   While this may be a bug caused by commons-vfs, this does not appear to be a bug in Accumulo. I think Accumulo is behaving correctly in the face of Errors thrown by the optional third-party component. See my comment at https://github.com/apache/accumulo/pull/3375#pullrequestreview-1414393608


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ivakegg commented on issue #3371: tservers falling over because a scan failed

Posted by "ivakegg (via GitHub)" <gi...@apache.org>.
ivakegg commented on issue #3371:
URL: https://github.com/apache/accumulo/issues/3371#issuecomment-1581501943

   I think I have now done enough to be able to mitigate this problem in our systems.  Closing this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ivakegg closed issue #3371: tservers falling over because a scan failed

Posted by "ivakegg (via GitHub)" <gi...@apache.org>.
ivakegg closed issue #3371: tservers falling over because a scan failed
URL: https://github.com/apache/accumulo/issues/3371


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] EdColeman commented on issue #3371: tservers falling over because a scan failed

Posted by "EdColeman (via GitHub)" <gi...@apache.org>.
EdColeman commented on issue #3371:
URL: https://github.com/apache/accumulo/issues/3371#issuecomment-1535433324

   Fixed by PR #3375  (PR is a possible approach)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on issue #3371: tservers falling over because a scan failed

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on issue #3371:
URL: https://github.com/apache/accumulo/issues/3371#issuecomment-1574282484

   @ivakegg - were you able to confirm that the issue is related to your scan trying to load a class that did not exist on the classpath (using table.classpath.context) and that the issue was due to something external to the Accumulo codebase?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] EdColeman commented on issue #3371: tservers falling over because a scan failed

Posted by "EdColeman (via GitHub)" <gi...@apache.org>.
EdColeman commented on issue #3371:
URL: https://github.com/apache/accumulo/issues/3371#issuecomment-1534837590

   I'll work to create a test iterator that throws Error (instead of an exception) to see if I can reproduce in a local / test environment. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ctubbsii commented on issue #3371: tservers falling over because a scan failed

Posted by "ctubbsii (via GitHub)" <gi...@apache.org>.
ctubbsii commented on issue #3371:
URL: https://github.com/apache/accumulo/issues/3371#issuecomment-1546532083

   > From an operational standpoint. I cannot have a scan thread that exposes a problem taking down tservers, especially when it is likely that that scan is being spread across a good portion of the system.
   
   Understood. However, from another operational standpoint, we can't have a bad scan thread corrupt a tservers' internal state and leave it running to cause more harm, like corrupting user data or returning bad results, either. I've seen that too.
   
   These are competing principles, but both have merits. The greater risk of the two really depends on the specific application and user risk tolerances. Best we can do is to try to make it possible for users to make their own choices, by making things more pluggable, which is what we've been doing over the last several years, to make Accumulo's components more modular, with SPI endpoints, and less tightly coupled default internal components.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ivakegg commented on issue #3371: tservers falling over because a scan failed

Posted by "ivakegg (via GitHub)" <gi...@apache.org>.
ivakegg commented on issue #3371:
URL: https://github.com/apache/accumulo/issues/3371#issuecomment-1561127629

   Now that I have a way of overriding the context class loader factory and pass through the configuration, I can override the vfs class loader with one that will capture NoClassDefError and throw an exception instead which will avoid the tserver from falling over.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ivakegg closed issue #3371: tservers falling over because a scan failed

Posted by "ivakegg (via GitHub)" <gi...@apache.org>.
ivakegg closed issue #3371: tservers falling over because a scan failed
URL: https://github.com/apache/accumulo/issues/3371


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ivakegg commented on issue #3371: tservers falling over because a scan failed

Posted by "ivakegg (via GitHub)" <gi...@apache.org>.
ivakegg commented on issue #3371:
URL: https://github.com/apache/accumulo/issues/3371#issuecomment-1534847662

   So the error being thrown in our case is
   ```
   java.lang.NoClassDefFoundError: Could not initialize class datawave......
        at datawave......
        ...
        at org.apache.accumulo.core.iteratorsImpl.system.SourceSwitchingiterator.readnext(SourceSwitchingIterator.java:165)
        ...
        at org.apache.accumulo.tserver.scan.LookupTask.run(LookupTask.java:129)
        at org.apache.accumulo.tserver.session.ScanSession$ScanMeasurer.run(ScanSession.java:62)
        at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
        at java.base/java.lang.Thread.run(Thread.java:829)
   Caused by: java.util.zip.ZipException: ZipFile closed
       ...
   
   Error thrown in thread: Thread[scan-default-Worker-14,5,main], halting VM.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ivakegg commented on issue #3371: tservers falling over because a scan failed

Posted by "ivakegg (via GitHub)" <gi...@apache.org>.
ivakegg commented on issue #3371:
URL: https://github.com/apache/accumulo/issues/3371#issuecomment-1561414120

   It appears the errors may be generated via the spring and we are simply not seeing the actual underlying cause which may still be the VFSClassLoader.  We will try to validate this once the #3399 is brought in and the context class loader factory is configured.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ivakegg commented on issue #3371: tservers falling over because a scan failed

Posted by "ivakegg (via GitHub)" <gi...@apache.org>.
ivakegg commented on issue #3371:
URL: https://github.com/apache/accumulo/issues/3371#issuecomment-1545700473

   From an operational standpoint.  I cannot have a scan thread that exposes a problem taking down tservers, especially when it is likely that that scan is being spread across a good portion of the system.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org