You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2019/03/18 16:56:14 UTC

[GitHub] [accumulo] keith-turner opened a new issue #1043: Support stable ~del split points and avoid ~del hotspots

keith-turner opened a new issue #1043: Support stable ~del split points and avoid ~del hotspots 
URL: https://github.com/apache/accumulo/issues/1043
 
 
   When a file is deleted an entry is written to the metadata table of the form `~del<path-to-delete>`.  This schema causes two problems.  First, bulk imports that happened around the same time sort together in the table.  If many bulk imports that happened close time also compact close in time then it can lead to a hotspot for the metdata table with many tablet servers trying to write to the same tablet.  The second problem is that the active splits for the `~del` prefix of the metdata table change over time, making it hard to presplit this portion of the metadata table.
   
   One possible solution to the problem is to hash the path and include that in the metadata table like `~del<hex(hash(path))><path>`.  This would lead to a stable set of split points and avoid hotspots.  This assumes nothing cares about the sorted order, which needs to be validated.  It's possible the sorted order matters for the directory ~del entries.  
   
   If this can be done, upgrade needs to be considered.  Below are two possible upgrade designs.
   
    * Rewrite the metdata table on upgrade and recompute the split points.
    * Support del entries with and without the hash.  One possible way to do this is with a different prefix like `~deh` that indicates a hash is present.
   
   May make sense to make the same change for the `~blip` section of the metadata table also.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services