You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2021/04/26 17:08:40 UTC

[GitHub] [accumulo] Manno15 opened a new issue #2035: Misconfigured Iterator blocks other tablets from loading

Manno15 opened a new issue #2035:
URL: https://github.com/apache/accumulo/issues/2035


   **Describe the bug**
   The original bug was reported here: https://issues.apache.org/jira/browse/ACCUMULO-4160
   
   If an iterator was misconfigured on a table (specifically for minor compaction in my testing), it can prevent other tables from loading their tablets as well as being able to perform operations on them beyond taking it offline and online. 
   
   **Versions (OS, Maven, Java, and others, as appropriate):**
    - Affected version(s) of this project: 2.1.0
   
   **To Reproduce**
   To reproduce, I edited an iterator (RowDeletingIterator for example) to throw an exception. This mimics the `BadIterator` class.
   After that:
   1. Create two tables in accumulo and ingest something to get one tablet in each.
   2. Configure one table to use this iterator in minc scope (Scan seemed to have proper error checking when done with a misconfigured iterator). 
   3. Flush the table with the bad iterator. This will cause the shell to hang indefinitely. Minor compaction will repeat until it completes which cannot happen unless the bad iterator is deleted. 
   4. Kill the Tserver and bring it back up. The table with the bad iterator will never load.
   5. If the good table is brought offline to online, it also will never load and no operation can be done on it again. 
   Deleting the bad iterator will allow the minor compaction and the Tserver go to back to a healthy state. 
   
   **Expected behavior**
   To not block other tablets from being loaded due to one table having a misconfigured iterator and minor compacting indefinitely. 
   
   **Additional context**
   It does seem that the main bug only occurs when the Tserver dies and is brought back up. Logs suggest it is still attempting to minor compact. A solution to the shell hanging indefinitely and possibly an error message to the user could allow someone to notice the misconfigured iterator earlier and be deleted. This has also been tested with `MultiTableRecoverIT` which uses agitation to test recovery. If one of the tables in that IT is given a misconfigured iterator, the test will hang until it timeouts. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #2035: Misconfigured Iterator blocks other tablets from loading

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2035:
URL: https://github.com/apache/accumulo/issues/2035#issuecomment-827047031


   The original issue was created by @EdColeman , and was a little sparse on details, but if I remember correctly, the issue might have been a typo in the class name (an error in the configuration) causing ClassNotFoundException, rather than a badly behaving iterator, which might be a related, but slightly different problem. A solution may fix both scenarios. I'm not certain.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 commented on issue #2035: Misconfigured Iterator blocks other tablets from loading

Posted by GitBox <gi...@apache.org>.
Manno15 commented on issue #2035:
URL: https://github.com/apache/accumulo/issues/2035#issuecomment-827532460


   > have been a typo in the class name (an error in the configuration) causing ClassNotFoundException, 
   
   From my testing, ClassNotFoundException was handled correctly. At least in the shell. 
   
   > To me, the hanging shell is not as important as being able to load other tables during a recovery if they are otherwise configured correctly. It maybe a solution solves both, but recovery will often happen unattended and may not be recognized immediate until other things start to misbehave. If you are in the shell and it hangs you'll likely notice and can act accordingly.
   
   The shell hanging was only on the initial part. The latter parts you mentioned, about recovery, does appear to happen as well after things get into a bad state. I will try to recreate the issue without relying on the minor compaction. The shell hanging was a byproduct of compactions retrying until canceled. The solution to that can be unrelated but if we make a limit to the number of retries before automatically cancelling (or delaying until everything else loads in) then that could solve both. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 edited a comment on issue #2035: Misconfigured Iterator blocks other tablets from loading

Posted by GitBox <gi...@apache.org>.
Manno15 edited a comment on issue #2035:
URL: https://github.com/apache/accumulo/issues/2035#issuecomment-827532460


   > have been a typo in the class name (an error in the configuration) causing ClassNotFoundException, 
   
   From my testing, ClassNotFoundException was handled correctly. At least in the shell. 
   
   > To me, the hanging shell is not as important as being able to load other tables during a recovery if they are otherwise configured correctly. It maybe a solution solves both, but recovery will often happen unattended and may not be recognized immediate until other things start to misbehave. If you are in the shell and it hangs you'll likely notice and can act accordingly.
   
   The shell hanging was only on the initial part. The latter parts you mentioned, about recovery, does appear to happen as well after things get into a bad state. I will try to recreate the issue without relying on the minor compaction. The shell hanging was a byproduct of compactions retrying until canceled. The solution to that can be unrelated but if we make a limit to the number of retries before automatically cancelling (or delaying until everything else loads in) then that could solve both. 
   
   EDIT: Turns out, the flush part wasn't needed. I was able to reproduce without that middle step.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] EdColeman commented on issue #2035: Misconfigured Iterator blocks other tablets from loading

Posted by GitBox <gi...@apache.org>.
EdColeman commented on issue #2035:
URL: https://github.com/apache/accumulo/issues/2035#issuecomment-827061656


   I could not recall if it was a typo - or if it was somehow deployment related - one possibility - a server missed updates, say it was offline during an upgrade that put down the iterator jar and then started without the necessary jar and joined the cluster.
   To me, the hanging shell is not as important as being able to load other tables during a recovery if they are otherwise configured correctly. It maybe a solution solves both, but recovery will often happen unattended and may not be recognized immediate until other things start to misbehave.  If you are in the shell and it hangs you'll likely notice and can act accordingly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] Manno15 closed issue #2035: Misconfigured Iterator blocks other tablets from loading

Posted by GitBox <gi...@apache.org>.
Manno15 closed issue #2035:
URL: https://github.com/apache/accumulo/issues/2035


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org