You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2022/11/04 16:55:04 UTC

[GitHub] [accumulo] EdColeman commented on issue #3065: Support read only Accumulo snapshots that can run in other data centers.

EdColeman commented on issue #3065:
URL: https://github.com/apache/accumulo/issues/3065#issuecomment-1303877092

   A few things.
   
   First as replication was designed, it only supported streaming ingest.  If you are doing bulk-ingest then you needed to make other provisions anyway.
   
   And as far as bulk ingest and replicating the data - it really is user dependent.  You can elect to generate the bulk import files and then replicate those or you could tee the data and perform the bulk ingest preparation in parallel at each location.  
   
   If you elect to ship bulk import files, then you only pay for the cost of generation once, but you could end up sending more data (if the ingest increases the data size because of indexing or multiple views...) and the latency is likely longer because the data is transmitted after the files are generated.
   
   If you tee the data, then you may have additional consistency questions - the bulk import processes could start at slightly different times with slightly different data.  The bulk prep job could fail or take longer at one of the locations,....  
   
   So it is really going to be user dependent.
   
   From a redundancy perspective, tee-ing the data and processing the data locally at each site probably has the best fail-over posture because of the increased isolation of all of the independent processes - but could also have the largest opportunity for scans to return slightly different results without additional synchronization.  
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org