You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Michael Stack (Jira)" <ji...@apache.org> on 2020/07/02 18:50:00 UTC

[jira] [Updated] (HBASE-24619) Try compact the recovered hfiles firstly after region online

     [ https://issues.apache.org/jira/browse/HBASE-24619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Stack updated HBASE-24619:
----------------------------------
    Description: 
As discussed in HBASE-23739 and in HBASE-24632, there may have many recovered hfiles. Should find a better way to compact them firstly after region online.

 

For instance (quoting our [~anoop.hbase]):

"Assume there were some small files because of flush but never got compacted before the RS down happened. We will look for the possible candidate from oldest files and in all chance the very old files would get excluded because of the size math. But It is possible that new flushed files would get selected. And we have the max files to compact config also which is 10 by default. Even these small files count alone might be >10. If there are say 15 WAL files to split, for sure we will have at least 15 small HFiles.
My thinking was this. After the region open, we have to make sure these small files are compacted in one go and we should not even consider the max files limit for this compaction. Also to note that this files might not even have the DBE/compression etc being applied. Ya coding wise am not sure how clean it might come."

 

And from our [~pankaj2461]

 

"...concern is the compaction after region open, which impact MTTR due to heavy IO in large cluster with many outstanding WALs"

 

  was:As discussed in HBASE-23739, there may have many recovered hfiles. Should find a better way to compact them firstly after region online.


> Try compact the recovered hfiles firstly after region online
> ------------------------------------------------------------
>
>                 Key: HBASE-24619
>                 URL: https://issues.apache.org/jira/browse/HBASE-24619
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 2.3.0
>            Reporter: Guanghao Zhang
>            Priority: Major
>
> As discussed in HBASE-23739 and in HBASE-24632, there may have many recovered hfiles. Should find a better way to compact them firstly after region online.
>  
> For instance (quoting our [~anoop.hbase]):
> "Assume there were some small files because of flush but never got compacted before the RS down happened. We will look for the possible candidate from oldest files and in all chance the very old files would get excluded because of the size math. But It is possible that new flushed files would get selected. And we have the max files to compact config also which is 10 by default. Even these small files count alone might be >10. If there are say 15 WAL files to split, for sure we will have at least 15 small HFiles.
> My thinking was this. After the region open, we have to make sure these small files are compacted in one go and we should not even consider the max files limit for this compaction. Also to note that this files might not even have the DBE/compression etc being applied. Ya coding wise am not sure how clean it might come."
>  
> And from our [~pankaj2461]
>  
> "...concern is the compaction after region open, which impact MTTR due to heavy IO in large cluster with many outstanding WALs"
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)