You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "keith-turner (via GitHub)" <gi...@apache.org> on 2023/05/11 19:05:44 UTC

[GitHub] [accumulo] keith-turner opened a new issue, #3397: Support maximum age for inmemory tablet data.

keith-turner opened a new issue, #3397:
URL: https://github.com/apache/accumulo/issues/3397

   **Is your feature request related to a problem? Please describe.**
   
   With the introduction of scan servers and eventually consistent scans, user can set the property `sserver.cache.metadata.expiration` to determine how long scan servers will cache file for any tablets.  This property set a rough upper bound on how old the tablet files will be when scanning a tablet on a scan server.
   
   Unwritten data in tablet server memory can persist for long periods of time though without ever being flushed to a file (which makes it visible to a scan server).  There is currently a property `table.compaction.minor.idle` that causes a minor compaction if tablet has not been written to in that time period.  However if the tablet is constantly being slowly written to it will not hit the idle time and may not hit the size threshhold for a long time, so data could be held in memory and not visible to the scan server for long periods of time.
   
   **Describe the solution you'd like**
   
   A new tablet property that forces tablets to write out their data after a specified amount of time.  The implementation could track the time when the first write is made to tablet memory and then force a compaction when time since the first write exceeds the configuration. Possible name for the new property could be `table.compaction.minor.maxAge`.
   
   With this new property `sserver.cache.metadata.expiration` + `table.compaction.minor.maxAge` gives an upper bound on how old the data for an eventual scan would be expected to be.
   
   Wondering if `sserver.cache.metadata.expiration`  should be a per table property.  Then tablet metadata could be cached for different time period in scan servers for different tables.  When its a scan server wide property it forces it to be set to the needs of the table with the lowest tolerance.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on issue #3397: Support maximum age for inmemory tablet data.

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on issue #3397:
URL: https://github.com/apache/accumulo/issues/3397#issuecomment-1580939396

   I wonder if `table.compaction.minor.idle` should be replaced with a property like `table.compaction.minor.interval`. Do we still need to perform a minor compaction when idle for some specified time period if we are going to flush at some interval?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on issue #3397: Support maximum age for inmemory tablet data.

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on issue #3397:
URL: https://github.com/apache/accumulo/issues/3397#issuecomment-1580990595

   > I wonder if table.compaction.minor.idle should be replaced with a property like table.compaction.minor.interval. Do we still need to perform a minor compaction when idle for some specified time period if we are going to flush at some interval?
   
   The new property is not quite an interval because its based on write activity. In the absence of continuous write activity flushes would not occur at a set interval.  The new property would flush based on time since first write to the in memory map.  The current idle timeout flushes based time since last write to an in memory map.  
   
   The idle timeout could still be useful for immediate scans are.  For this case may not want to flush a tablet that is being actively written to and scanned, but once it has not been written for a while it makes sense to flush it.
   
   Maybe the new timeout makes sense for the new eventual scan use case and the idle timeout still makes sense for the existing immediate scan use case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org