You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "dlmarion (via GitHub)" <gi...@apache.org> on 2023/07/31 15:20:57 UTC

[GitHub] [accumulo] dlmarion opened a new issue, #3668: DatafileManager.importMapFiles question

dlmarion opened a new issue, #3668:
URL: https://github.com/apache/accumulo/issues/3668

   The Tablet metadata is updated in `DatafileManager.importMapFiles` [here](https://github.com/apache/accumulo/blob/2.1/server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/DatafileManager.java#L265) outside of the tablet lock. Then, the `datafileSizes` map is modified inside of the Tablet lock [here](https://github.com/apache/accumulo/blob/2.1/server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/DatafileManager.java#L274). [`Tablet.compareTabletInfo`](https://github.com/apache/accumulo/blob/2.1/server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java#L1141) is called from a thread in the TabletServer periodically. Does updating the tablet metadata outside of the tablet lock in `importMapFiles` make it more likely that the `compareTabletInfo` check would see a file in the tablet metadata but *not* in datafileSizes? It seems to me that if `compareTabletInfo` ran between Tablet line 265 and line 269, then it would report missing files that may 
 not be missing.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on issue #3668: DatafileManager.importMapFiles question

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on issue #3668:
URL: https://github.com/apache/accumulo/issues/3668#issuecomment-1658647765

   Any other thing that grabs the tablet lock, not just the call to `compareTabletInfo`, could hold up a bulk import and leave the tablet in a state where datafileSizes is out of sync with the tablet metadata. If the Thrift RPC that calls `importMapFiles` times out while waiting for the tablet lock, then this would be bad, right? I think we might want to update that metadata table inside the tablet lock.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ivakegg commented on issue #3668: DatafileManager.importMapFiles question

Posted by "ivakegg (via GitHub)" <gi...@apache.org>.
ivakegg commented on issue #3668:
URL: https://github.com/apache/accumulo/issues/3668#issuecomment-1660199610

   Whatever the case is here, there will always be the possibility that the metadata may get updated but the in-memory data files will not.  In the case that I have witnessed, the metadata tablet was extremely busy.  The result was that the metadata got updated however the socket timed out on the client side and hence the client did not update the in-memory.  I am wondering if there is a way in which the tserver can remedy this situation when it is detected.  Alternatively, the metadata update is actually the trigger for the tserver to update its in-memory view.  Then maybe we can avoid this half-brain scenario.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on issue #3668: DatafileManager.importMapFiles question

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on issue #3668:
URL: https://github.com/apache/accumulo/issues/3668#issuecomment-1658658651

   > Does updating the tablet metadata outside of the tablet lock in importMapFiles make it more likely that the compareTabletInfo check would see a file in the tablet metadata but not in datafileSizes?
   
   compareTabletInfo relies not on the tablet lock, but on counters [incremented here](https://github.com/apache/accumulo/blob/e4df61206245cd4b454debfeef2f62973cc4531b/server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/DatafileManager.java#L246) and [incremented here](https://github.com/apache/accumulo/blob/e4df61206245cd4b454debfeef2f62973cc4531b/server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/DatafileManager.java#L287).
   
   compareTabletInfo does the following.
   
    1. Get a copy of a tablets update counters
    2. Read the tablets metadata
    3. compare metadata read to what tablet has in memory
    4. Get another copy of the tablets update counters
    5. If the counters are exactly the same it knows its check did not overlap in time with an update.  If it did not overlap in time with an update and things differ then it logs a message.
   
   The counter increments must cover the in memory update update and metadata update.  There was a problem where the counter increments were not covering everything that was fixed in #3392. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ivakegg commented on issue #3668: DatafileManager.importMapFiles question

Posted by "ivakegg (via GitHub)" <gi...@apache.org>.
ivakegg commented on issue #3668:
URL: https://github.com/apache/accumulo/issues/3668#issuecomment-1658608897

   I would expect that this is certainly the case.  What is more interesting is that if the lock fails to be acquired or some failure occurs after updating the metadata but before the datafileSizes gets updated.  In that case the metadata will continually show to be different than the in-memory.  Is there additional error recovery that can be done to handle that situation?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on issue #3668: DatafileManager.importMapFiles question

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on issue #3668:
URL: https://github.com/apache/accumulo/issues/3668#issuecomment-1664243646

   I believe the issue here is an exception happening between line 265 and line 274. I think the simplest way to handle an exception here is to unload the tablet.
   
   https://github.com/apache/accumulo/blob/26a71d062702ad703f445ab144880c4c0e323c6e/server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/DatafileManager.java#L248C10-L280


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on issue #3668: DatafileManager.importMapFiles question

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on issue #3668:
URL: https://github.com/apache/accumulo/issues/3668#issuecomment-1658704950

   >  If the Thrift RPC that calls importMapFiles times out while waiting for the tablet lock, then this would be bad, right? I think we might want to update that metadata table inside the tablet lock.
   
   The intention is to avoid any I/O while holding the tablet lock and only do non blocking things like update in memory data structs.  All scans get the tablet lock when they start to determine what files and in memory maps to use and also reserve those resources while holding the lock.  Anything that did I/O while holding the tablet lock would negatively impact scans.  While the intention is not do I/O while holding the tablet lock its hard to analyze the code and ensure this does not happen because code gets the lock and then calls functions.  Over time once of these functions calls could have I/O added.   A lot of code logs using log4j while holding the tablet lock, its possible log4j could block on I/O also.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org