You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2020/03/11 14:42:33 UTC

[GitHub] [accumulo] drewfarris opened a new issue #1559: Consider implementing failsafe to prevent large number of tablet assignments in bulk import

drewfarris opened a new issue #1559: Consider implementing failsafe to prevent large number of tablet assignments in bulk import
URL: https://github.com/apache/accumulo/issues/1559
 
 
   We encountered an issue where our software produced a set of poorly partitioned rfiles and then attempted to bulk import them into Accumulo. Each of these files had data that corresponded to nearly every extent for a given table. This resulted in a very large number of  major compactions and it took quite awhile to work through these due to namenode contention, e.g: in `getBlockLocations`.
   
   It would be nice if we could establish a threshold in the bulk import process to abort when encountering a rfile that maps to more than a specified number of extents. Granted, we should certainly use a proper partitioner on the process of generating these rfiles, but this would provide a failsafe in the event of unexpected partitioner issues in the future.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] ctubbsii commented on issue #1559: Consider implementing failsafe to prevent large number of tablet assignments in bulk import

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #1559: Consider implementing failsafe to prevent large number of tablet assignments in bulk import
URL: https://github.com/apache/accumulo/issues/1559#issuecomment-597727121
 
 
   Is this a problem with the new bulk import API?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] milleruntime commented on issue #1559: Consider implementing failsafe to prevent large number of tablet assignments in bulk import

Posted by GitBox <gi...@apache.org>.
milleruntime commented on issue #1559: Consider implementing failsafe to prevent large number of tablet assignments in bulk import
URL: https://github.com/apache/accumulo/issues/1559#issuecomment-597733441
 
 
   > Is this a problem with the new bulk import API?
   
   I think this can still happen in the new bulk import for 2.x as well.  The version used was 1.9 but I think this is more of an enhancement to prevent bad data from causing catastrophic failures. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services