You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by GitBox <gi...@apache.org> on 2021/12/14 00:57:28 UTC

[GitHub] [hbase] wchevreuil opened a new pull request #3942: HBASE-26265 Update ref guide to mention the new store file tracker im…

wchevreuil opened a new pull request #3942:
URL: https://github.com/apache/hbase/pull/3942


   …plementations


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] joshelser commented on a change in pull request #3942: HBASE-26265 Update ref guide to mention the new store file tracker im…

Posted by GitBox <gi...@apache.org>.
joshelser commented on a change in pull request #3942:
URL: https://github.com/apache/hbase/pull/3942#discussion_r768834336



##########
File path: src/main/asciidoc/_chapters/store_file_tracking.adoc
##########
@@ -0,0 +1,175 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[[storefiletracking]]
+= Store File Tracking
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+== Overview
+
+This feature introduces an abstraction layer to track store files still used/needed by store
+engines, allowing for plugging different approaches of identifying store
+files required by the given store.
+
+Historically, HBase internals have relied on creating hfiles on temporary directories first, renaming
+those files to the actual store directory at operation commit time. That's a simple and convenient
+way to separate transient from already finalised files that are ready to serve client reads with data.
+This approach works well with strong consistent file systems, but with the popularity of less consistent
+file systems, mainly Object Store file systems, dependency on rename operations starts to introduce
+performance penalties. Amazon S3 Object Store, in particular, has been the most affected deployment,

Review comment:
       ```suggestion
   performance penalties. The Amazon S3 Object Store, in particular, has been the most affected deployment
   ```

##########
File path: src/main/asciidoc/_chapters/store_file_tracking.adoc
##########
@@ -0,0 +1,175 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[[storefiletracking]]
+= Store File Tracking
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+== Overview
+
+This feature introduces an abstraction layer to track store files still used/needed by store
+engines, allowing for plugging different approaches of identifying store
+files required by the given store.
+
+Historically, HBase internals have relied on creating hfiles on temporary directories first, renaming
+those files to the actual store directory at operation commit time. That's a simple and convenient
+way to separate transient from already finalised files that are ready to serve client reads with data.
+This approach works well with strong consistent file systems, but with the popularity of less consistent
+file systems, mainly Object Store file systems, dependency on rename operations starts to introduce
+performance penalties. Amazon S3 Object Store, in particular, has been the most affected deployment,
+due to the its lack of atomic renames, requiring an additional locking layer implemented by HBOSS,
+to guarantee consistency and integrity of operations.

Review comment:
       ```suggestion
   to guarantee atomicity of operations against S3.
   ```
   
   Maybe I'm being nit-picky here? I think it makes a confusing topic easier to understand if we just say "atomic renames", even though "consistency" and "integrity" would be things sacrificed when we have non-atomic renames :)

##########
File path: src/main/asciidoc/_chapters/store_file_tracking.adoc
##########
@@ -0,0 +1,175 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[[storefiletracking]]
+= Store File Tracking
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+== Overview
+
+This feature introduces an abstraction layer to track store files still used/needed by store
+engines, allowing for plugging different approaches of identifying store
+files required by the given store.
+
+Historically, HBase internals have relied on creating hfiles on temporary directories first, renaming
+those files to the actual store directory at operation commit time. That's a simple and convenient
+way to separate transient from already finalised files that are ready to serve client reads with data.
+This approach works well with strong consistent file systems, but with the popularity of less consistent
+file systems, mainly Object Store file systems, dependency on rename operations starts to introduce
+performance penalties. Amazon S3 Object Store, in particular, has been the most affected deployment,
+due to the its lack of atomic renames, requiring an additional locking layer implemented by HBOSS,

Review comment:
       ```suggestion
   due to the its lack of atomic renames. The HBase community temporarily bypassed this problem by building a distributed locking layer called HBOSS
   ```

##########
File path: src/main/asciidoc/_chapters/store_file_tracking.adoc
##########
@@ -0,0 +1,175 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[[storefiletracking]]
+= Store File Tracking
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+== Overview
+
+This feature introduces an abstraction layer to track store files still used/needed by store
+engines, allowing for plugging different approaches of identifying store
+files required by the given store.
+
+Historically, HBase internals have relied on creating hfiles on temporary directories first, renaming
+those files to the actual store directory at operation commit time. That's a simple and convenient
+way to separate transient from already finalised files that are ready to serve client reads with data.
+This approach works well with strong consistent file systems, but with the popularity of less consistent
+file systems, mainly Object Store file systems, dependency on rename operations starts to introduce

Review comment:
       ```suggestion
   file systems, mainly Object Store which can be used like file systems, HBase's dependency on atomic rename operations starts to introduce
   ```

##########
File path: src/main/asciidoc/_chapters/store_file_tracking.adoc
##########
@@ -0,0 +1,175 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[[storefiletracking]]
+= Store File Tracking
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+== Overview
+
+This feature introduces an abstraction layer to track store files still used/needed by store
+engines, allowing for plugging different approaches of identifying store
+files required by the given store.
+
+Historically, HBase internals have relied on creating hfiles on temporary directories first, renaming
+those files to the actual store directory at operation commit time. That's a simple and convenient
+way to separate transient from already finalised files that are ready to serve client reads with data.
+This approach works well with strong consistent file systems, but with the popularity of less consistent
+file systems, mainly Object Store file systems, dependency on rename operations starts to introduce
+performance penalties. Amazon S3 Object Store, in particular, has been the most affected deployment,
+due to the its lack of atomic renames, requiring an additional locking layer implemented by HBOSS,
+to guarantee consistency and integrity of operations.
+
+With *Store File Tracking*, decision on where to originally create new hfiles and how to proceed upon
+commit is delegated to the specific Store File Tracking implementation.
+It can be set at individual Table or Column Family configurations, as well as in processes
+*hbase-site.xml* configuration file.

Review comment:
       ```suggestion
   The implementation can be set at the HBase service leve in *hbase-site.xml* or at the Table or Column Family via the TableDescriptor configuration.
   ```

##########
File path: src/main/asciidoc/_chapters/store_file_tracking.adoc
##########
@@ -0,0 +1,175 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[[storefiletracking]]
+= Store File Tracking
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+== Overview
+
+This feature introduces an abstraction layer to track store files still used/needed by store
+engines, allowing for plugging different approaches of identifying store
+files required by the given store.
+
+Historically, HBase internals have relied on creating hfiles on temporary directories first, renaming
+those files to the actual store directory at operation commit time. That's a simple and convenient
+way to separate transient from already finalised files that are ready to serve client reads with data.
+This approach works well with strong consistent file systems, but with the popularity of less consistent
+file systems, mainly Object Store file systems, dependency on rename operations starts to introduce
+performance penalties. Amazon S3 Object Store, in particular, has been the most affected deployment,
+due to the its lack of atomic renames, requiring an additional locking layer implemented by HBOSS,
+to guarantee consistency and integrity of operations.
+
+With *Store File Tracking*, decision on where to originally create new hfiles and how to proceed upon
+commit is delegated to the specific Store File Tracking implementation.
+It can be set at individual Table or Column Family configurations, as well as in processes
+*hbase-site.xml* configuration file.
+
+NOTE: When specified in *hbase_site.xml*, this configuration is also saved into tables configuration

Review comment:
       ```suggestion
   NOTE: When the store file tracking implementation is specified in *hbase_site.xml*, this configuration is also propagated into a table's configuration
   ```

##########
File path: src/main/asciidoc/_chapters/store_file_tracking.adoc
##########
@@ -0,0 +1,175 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[[storefiletracking]]
+= Store File Tracking
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+== Overview
+
+This feature introduces an abstraction layer to track store files still used/needed by store
+engines, allowing for plugging different approaches of identifying store
+files required by the given store.
+
+Historically, HBase internals have relied on creating hfiles on temporary directories first, renaming
+those files to the actual store directory at operation commit time. That's a simple and convenient
+way to separate transient from already finalised files that are ready to serve client reads with data.
+This approach works well with strong consistent file systems, but with the popularity of less consistent
+file systems, mainly Object Store file systems, dependency on rename operations starts to introduce
+performance penalties. Amazon S3 Object Store, in particular, has been the most affected deployment,
+due to the its lack of atomic renames, requiring an additional locking layer implemented by HBOSS,
+to guarantee consistency and integrity of operations.
+
+With *Store File Tracking*, decision on where to originally create new hfiles and how to proceed upon
+commit is delegated to the specific Store File Tracking implementation.
+It can be set at individual Table or Column Family configurations, as well as in processes
+*hbase-site.xml* configuration file.
+
+NOTE: When specified in *hbase_site.xml*, this configuration is also saved into tables configuration
+at table creation time. This is to avoid dangerous configuration mismatches between processes, which
+could potentially lead to data loss.
+
+== Available Implementations
+
+Store File Tracking initial version provides three builtin implementations:
+
+* DEFAULT
+* FILE
+* MIGRATION
+
+### DEFAULT
+
+As per the name, this is the Store File Tracking implementation used by default when now explicit
+configuration has been defined. The DEFAULT tracker implements the standard approach using temporary
+directories and renames.

Review comment:
       ```suggestion
   directories and renames. This is how all previous (implicit) implementation that HBase used to track store files.
   ```

##########
File path: src/main/asciidoc/_chapters/store_file_tracking.adoc
##########
@@ -0,0 +1,175 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[[storefiletracking]]
+= Store File Tracking
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+== Overview
+
+This feature introduces an abstraction layer to track store files still used/needed by store
+engines, allowing for plugging different approaches of identifying store
+files required by the given store.
+
+Historically, HBase internals have relied on creating hfiles on temporary directories first, renaming
+those files to the actual store directory at operation commit time. That's a simple and convenient
+way to separate transient from already finalised files that are ready to serve client reads with data.
+This approach works well with strong consistent file systems, but with the popularity of less consistent
+file systems, mainly Object Store file systems, dependency on rename operations starts to introduce
+performance penalties. Amazon S3 Object Store, in particular, has been the most affected deployment,
+due to the its lack of atomic renames, requiring an additional locking layer implemented by HBOSS,
+to guarantee consistency and integrity of operations.
+
+With *Store File Tracking*, decision on where to originally create new hfiles and how to proceed upon
+commit is delegated to the specific Store File Tracking implementation.
+It can be set at individual Table or Column Family configurations, as well as in processes
+*hbase-site.xml* configuration file.
+
+NOTE: When specified in *hbase_site.xml*, this configuration is also saved into tables configuration
+at table creation time. This is to avoid dangerous configuration mismatches between processes, which
+could potentially lead to data loss.
+
+== Available Implementations
+
+Store File Tracking initial version provides three builtin implementations:
+
+* DEFAULT
+* FILE
+* MIGRATION
+
+### DEFAULT
+
+As per the name, this is the Store File Tracking implementation used by default when now explicit
+configuration has been defined. The DEFAULT tracker implements the standard approach using temporary
+directories and renames.
+
+### FILE
+
+A file tracker implementation that creates new files straight in the store directory, avoiding the
+need for rename operations. It keeps a list of committed hfiles in memory, backed by meta files, in
+each store directory. Whenever a new hfile is committed, the list of _tracked files_ in the given
+store is updated and a new meta file is written with this list contents, discarding the previous
+meta file now containing an out dated list.
+
+### MIGRATION
+
+A special implementation to be used when swapping between Store File Tracking implementations on
+pre-existing tables that already contain data, and therefore, files being tracked under an specific
+logic.
+
+== Usage
+
+For fresh deployments that don't yet contain any user data, *FILE* implementation can be just set as
+value for *hbase.store.file-tracker.impl* property in global *hbase-site.xml* configuration, prior
+to the first hbase start. Omitting this property sets the *DEFAULT* implementation.
+
+### Switching implementations globally
+
+For running clusters with tables already containing data, Store File Tracking implementation can

Review comment:
       ```suggestion
   For clusters with data that are upgraded to a version of HBase containing the store file tracking feature, the Store File Tracking implementation can
   ```

##########
File path: src/main/asciidoc/_chapters/store_file_tracking.adoc
##########
@@ -0,0 +1,175 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[[storefiletracking]]
+= Store File Tracking
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+== Overview
+
+This feature introduces an abstraction layer to track store files still used/needed by store
+engines, allowing for plugging different approaches of identifying store
+files required by the given store.
+
+Historically, HBase internals have relied on creating hfiles on temporary directories first, renaming
+those files to the actual store directory at operation commit time. That's a simple and convenient
+way to separate transient from already finalised files that are ready to serve client reads with data.
+This approach works well with strong consistent file systems, but with the popularity of less consistent
+file systems, mainly Object Store file systems, dependency on rename operations starts to introduce
+performance penalties. Amazon S3 Object Store, in particular, has been the most affected deployment,
+due to the its lack of atomic renames, requiring an additional locking layer implemented by HBOSS,
+to guarantee consistency and integrity of operations.
+
+With *Store File Tracking*, decision on where to originally create new hfiles and how to proceed upon
+commit is delegated to the specific Store File Tracking implementation.
+It can be set at individual Table or Column Family configurations, as well as in processes
+*hbase-site.xml* configuration file.
+
+NOTE: When specified in *hbase_site.xml*, this configuration is also saved into tables configuration
+at table creation time. This is to avoid dangerous configuration mismatches between processes, which
+could potentially lead to data loss.
+
+== Available Implementations
+
+Store File Tracking initial version provides three builtin implementations:
+
+* DEFAULT
+* FILE
+* MIGRATION
+
+### DEFAULT
+
+As per the name, this is the Store File Tracking implementation used by default when now explicit
+configuration has been defined. The DEFAULT tracker implements the standard approach using temporary
+directories and renames.
+
+### FILE
+
+A file tracker implementation that creates new files straight in the store directory, avoiding the
+need for rename operations. It keeps a list of committed hfiles in memory, backed by meta files, in
+each store directory. Whenever a new hfile is committed, the list of _tracked files_ in the given
+store is updated and a new meta file is written with this list contents, discarding the previous
+meta file now containing an out dated list.
+
+### MIGRATION
+
+A special implementation to be used when swapping between Store File Tracking implementations on
+pre-existing tables that already contain data, and therefore, files being tracked under an specific
+logic.
+
+== Usage
+
+For fresh deployments that don't yet contain any user data, *FILE* implementation can be just set as
+value for *hbase.store.file-tracker.impl* property in global *hbase-site.xml* configuration, prior
+to the first hbase start. Omitting this property sets the *DEFAULT* implementation.
+
+### Switching implementations globally

Review comment:
       I think Wellington means a migration case, rather than a table which already has an SFT implementation set.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #3942: HBASE-26265 Update ref guide to mention the new store file tracker im…

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #3942:
URL: https://github.com/apache/hbase/pull/3942#issuecomment-993056418


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 28s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ HBASE-26067 Compile Tests _ |
   ||| _ Patch Compile Tests _ |
   ||| _ Other Tests _ |
   |  |   |   1m 43s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3942/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/3942 |
   | Optional Tests |  |
   | uname | Linux 9be4c34f5522 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | HBASE-26067 / 4aa3f47aa2 |
   | Max. process+thread count | 54 (vs. ulimit of 30000) |
   | modules | C: . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3942/1/console |
   | versions | git=2.17.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache9 commented on a change in pull request #3942: HBASE-26265 Update ref guide to mention the new store file tracker im…

Posted by GitBox <gi...@apache.org>.
Apache9 commented on a change in pull request #3942:
URL: https://github.com/apache/hbase/pull/3942#discussion_r770644789



##########
File path: src/main/asciidoc/_chapters/store_file_tracking.adoc
##########
@@ -0,0 +1,175 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[[storefiletracking]]
+= Store File Tracking
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+== Overview
+
+This feature introduces an abstraction layer to track store files still used/needed by store
+engines, allowing for plugging different approaches of identifying store
+files required by the given store.
+
+Historically, HBase internals have relied on creating hfiles on temporary directories first, renaming
+those files to the actual store directory at operation commit time. That's a simple and convenient
+way to separate transient from already finalised files that are ready to serve client reads with data.
+This approach works well with strong consistent file systems, but with the popularity of less consistent
+file systems, mainly Object Store file systems, dependency on rename operations starts to introduce
+performance penalties. Amazon S3 Object Store, in particular, has been the most affected deployment,
+due to the its lack of atomic renames, requiring an additional locking layer implemented by HBOSS,
+to guarantee consistency and integrity of operations.
+
+With *Store File Tracking*, decision on where to originally create new hfiles and how to proceed upon
+commit is delegated to the specific Store File Tracking implementation.
+It can be set at individual Table or Column Family configurations, as well as in processes
+*hbase-site.xml* configuration file.
+
+NOTE: When specified in *hbase_site.xml*, this configuration is also saved into tables configuration
+at table creation time. This is to avoid dangerous configuration mismatches between processes, which
+could potentially lead to data loss.
+
+== Available Implementations
+
+Store File Tracking initial version provides three builtin implementations:
+
+* DEFAULT
+* FILE
+* MIGRATION
+
+### DEFAULT
+
+As per the name, this is the Store File Tracking implementation used by default when now explicit
+configuration has been defined. The DEFAULT tracker implements the standard approach using temporary
+directories and renames.
+
+### FILE
+
+A file tracker implementation that creates new files straight in the store directory, avoiding the
+need for rename operations. It keeps a list of committed hfiles in memory, backed by meta files, in
+each store directory. Whenever a new hfile is committed, the list of _tracked files_ in the given
+store is updated and a new meta file is written with this list contents, discarding the previous
+meta file now containing an out dated list.
+
+### MIGRATION
+
+A special implementation to be used when swapping between Store File Tracking implementations on
+pre-existing tables that already contain data, and therefore, files being tracked under an specific
+logic.
+
+== Usage
+
+For fresh deployments that don't yet contain any user data, *FILE* implementation can be just set as
+value for *hbase.store.file-tracker.impl* property in global *hbase-site.xml* configuration, prior
+to the first hbase start. Omitting this property sets the *DEFAULT* implementation.
+
+### Switching implementations globally

Review comment:
       Filed HBASE-26586, HBASE-26587 and HBASE-26588 for these things. Will work on them soon once the feature branch is merged :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #3942: HBASE-26265 Update ref guide to mention the new store file tracker im…

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #3942:
URL: https://github.com/apache/hbase/pull/3942#issuecomment-994100775


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 27s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  4s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ HBASE-26067 Compile Tests _ |
   ||| _ Patch Compile Tests _ |
   ||| _ Other Tests _ |
   |  |   |   1m 21s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3942/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/3942 |
   | Optional Tests |  |
   | uname | Linux 24611717e5b5 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | HBASE-26067 / 4aa3f47aa2 |
   | Max. process+thread count | 45 (vs. ulimit of 30000) |
   | modules | C: . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3942/2/console |
   | versions | git=2.17.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #3942: HBASE-26265 Update ref guide to mention the new store file tracker im…

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #3942:
URL: https://github.com/apache/hbase/pull/3942#issuecomment-996063325


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m  3s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  4s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ HBASE-26067 Compile Tests _ |
   ||| _ Patch Compile Tests _ |
   ||| _ Other Tests _ |
   |  |   |   2m  9s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3942/3/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/3942 |
   | Optional Tests |  |
   | uname | Linux 13fc112c8371 4.15.0-142-generic #146-Ubuntu SMP Tue Apr 13 01:11:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | HBASE-26067 / 81c9b8793e |
   | Max. process+thread count | 41 (vs. ulimit of 30000) |
   | modules | C: . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3942/3/console |
   | versions | git=2.17.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] wchevreuil commented on a change in pull request #3942: HBASE-26265 Update ref guide to mention the new store file tracker im…

Posted by GitBox <gi...@apache.org>.
wchevreuil commented on a change in pull request #3942:
URL: https://github.com/apache/hbase/pull/3942#discussion_r769089392



##########
File path: src/main/asciidoc/_chapters/store_file_tracking.adoc
##########
@@ -0,0 +1,175 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[[storefiletracking]]
+= Store File Tracking
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+== Overview
+
+This feature introduces an abstraction layer to track store files still used/needed by store
+engines, allowing for plugging different approaches of identifying store
+files required by the given store.
+
+Historically, HBase internals have relied on creating hfiles on temporary directories first, renaming
+those files to the actual store directory at operation commit time. That's a simple and convenient
+way to separate transient from already finalised files that are ready to serve client reads with data.
+This approach works well with strong consistent file systems, but with the popularity of less consistent
+file systems, mainly Object Store file systems, dependency on rename operations starts to introduce
+performance penalties. Amazon S3 Object Store, in particular, has been the most affected deployment,
+due to the its lack of atomic renames, requiring an additional locking layer implemented by HBOSS,
+to guarantee consistency and integrity of operations.
+
+With *Store File Tracking*, decision on where to originally create new hfiles and how to proceed upon
+commit is delegated to the specific Store File Tracking implementation.
+It can be set at individual Table or Column Family configurations, as well as in processes
+*hbase-site.xml* configuration file.
+
+NOTE: When specified in *hbase_site.xml*, this configuration is also saved into tables configuration
+at table creation time. This is to avoid dangerous configuration mismatches between processes, which
+could potentially lead to data loss.
+
+== Available Implementations
+
+Store File Tracking initial version provides three builtin implementations:
+
+* DEFAULT
+* FILE
+* MIGRATION
+
+### DEFAULT
+
+As per the name, this is the Store File Tracking implementation used by default when now explicit
+configuration has been defined. The DEFAULT tracker implements the standard approach using temporary
+directories and renames.
+
+### FILE
+
+A file tracker implementation that creates new files straight in the store directory, avoiding the
+need for rename operations. It keeps a list of committed hfiles in memory, backed by meta files, in
+each store directory. Whenever a new hfile is committed, the list of _tracked files_ in the given
+store is updated and a new meta file is written with this list contents, discarding the previous
+meta file now containing an out dated list.
+
+### MIGRATION
+
+A special implementation to be used when swapping between Store File Tracking implementations on
+pre-existing tables that already contain data, and therefore, files being tracked under an specific
+logic.
+
+== Usage
+
+For fresh deployments that don't yet contain any user data, *FILE* implementation can be just set as
+value for *hbase.store.file-tracker.impl* property in global *hbase-site.xml* configuration, prior
+to the first hbase start. Omitting this property sets the *DEFAULT* implementation.
+
+### Switching implementations globally

Review comment:
       > Have you tried this operation? I do not think it works in this way...
   > 
   > The global config will only effect new tables. So I think here you need to alter the tables one by one...
   
   Partially true. As Josh said, this is possible only when no explicit SFT configuration has yet been set in either hbase-site or the table descriptor. So basically a migration from the DEFAULT tracker to FILE. Let me update this section to make it clear this only works when these mentioned conditions are met.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #3942: HBASE-26265 Update ref guide to mention the new store file tracker im…

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #3942:
URL: https://github.com/apache/hbase/pull/3942#issuecomment-994101166


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m  0s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  4s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ HBASE-26067 Compile Tests _ |
   ||| _ Patch Compile Tests _ |
   ||| _ Other Tests _ |
   |  |   |   1m 56s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3942/2/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/3942 |
   | Optional Tests |  |
   | uname | Linux e230a6395460 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | HBASE-26067 / 4aa3f47aa2 |
   | Max. process+thread count | 46 (vs. ulimit of 30000) |
   | modules | C: . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3942/2/console |
   | versions | git=2.17.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #3942: HBASE-26265 Update ref guide to mention the new store file tracker im…

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #3942:
URL: https://github.com/apache/hbase/pull/3942#issuecomment-994110717


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 58s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ HBASE-26067 Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m 31s |  HBASE-26067 passed  |
   | +0 :ok: |  refguide  |   4m  4s |  branch has no errors when building the reference guide. See footer for rendered docs, which you should manually inspect.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m 14s |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +0 :ok: |  refguide  |   3m 44s |  patch has no errors when building the reference guide. See footer for rendered docs, which you should manually inspect.  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 17s |  The patch does not generate ASF License warnings.  |
   |  |   |  19m 10s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3942/2/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/3942 |
   | Optional Tests | dupname asflicense refguide |
   | uname | Linux 4fab6c518881 4.15.0-163-generic #171-Ubuntu SMP Fri Nov 5 11:55:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | HBASE-26067 / 4aa3f47aa2 |
   | refguide | https://nightlies.apache.org/hbase/HBase/HBase-PreCommit-GitHub-PR/PR-3942/2/yetus-general-check/output/branch-site/book.html |
   | refguide | https://nightlies.apache.org/hbase/HBase/HBase-PreCommit-GitHub-PR/PR-3942/2/yetus-general-check/output/patch-site/book.html |
   | Max. process+thread count | 65 (vs. ulimit of 30000) |
   | modules | C: . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3942/2/console |
   | versions | git=2.17.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] wchevreuil commented on a change in pull request #3942: HBASE-26265 Update ref guide to mention the new store file tracker im…

Posted by GitBox <gi...@apache.org>.
wchevreuil commented on a change in pull request #3942:
URL: https://github.com/apache/hbase/pull/3942#discussion_r770631946



##########
File path: src/main/asciidoc/_chapters/store_file_tracking.adoc
##########
@@ -0,0 +1,175 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[[storefiletracking]]
+= Store File Tracking
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+== Overview
+
+This feature introduces an abstraction layer to track store files still used/needed by store
+engines, allowing for plugging different approaches of identifying store
+files required by the given store.
+
+Historically, HBase internals have relied on creating hfiles on temporary directories first, renaming
+those files to the actual store directory at operation commit time. That's a simple and convenient
+way to separate transient from already finalised files that are ready to serve client reads with data.
+This approach works well with strong consistent file systems, but with the popularity of less consistent
+file systems, mainly Object Store file systems, dependency on rename operations starts to introduce
+performance penalties. Amazon S3 Object Store, in particular, has been the most affected deployment,
+due to the its lack of atomic renames, requiring an additional locking layer implemented by HBOSS,
+to guarantee consistency and integrity of operations.
+
+With *Store File Tracking*, decision on where to originally create new hfiles and how to proceed upon
+commit is delegated to the specific Store File Tracking implementation.
+It can be set at individual Table or Column Family configurations, as well as in processes
+*hbase-site.xml* configuration file.
+
+NOTE: When specified in *hbase_site.xml*, this configuration is also saved into tables configuration
+at table creation time. This is to avoid dangerous configuration mismatches between processes, which
+could potentially lead to data loss.
+
+== Available Implementations
+
+Store File Tracking initial version provides three builtin implementations:
+
+* DEFAULT
+* FILE
+* MIGRATION
+
+### DEFAULT
+
+As per the name, this is the Store File Tracking implementation used by default when now explicit
+configuration has been defined. The DEFAULT tracker implements the standard approach using temporary
+directories and renames.
+
+### FILE
+
+A file tracker implementation that creates new files straight in the store directory, avoiding the
+need for rename operations. It keeps a list of committed hfiles in memory, backed by meta files, in
+each store directory. Whenever a new hfile is committed, the list of _tracked files_ in the given
+store is updated and a new meta file is written with this list contents, discarding the previous
+meta file now containing an out dated list.
+
+### MIGRATION
+
+A special implementation to be used when swapping between Store File Tracking implementations on
+pre-existing tables that already contain data, and therefore, files being tracked under an specific
+logic.
+
+== Usage
+
+For fresh deployments that don't yet contain any user data, *FILE* implementation can be just set as
+value for *hbase.store.file-tracker.impl* property in global *hbase-site.xml* configuration, prior
+to the first hbase start. Omitting this property sets the *DEFAULT* implementation.
+
+### Switching implementations globally

Review comment:
       Ok, my test deployment didn't include HBASE-26263, which would basically cause the global config switch to be ignored. 
   
   >MigrateStoreFileTrackerProcedure will not always set the SFT impl to default, if we set a MIGRATION store file tracker globally, RollingUpgradeChore will set the SFT implementation to MIGRATION.
   
   We could add a validation in StoreFileTrackerFactory to not allow setting MIGRATION when no explicitly SFT config has been set yet? That would prevent this issue.
   
   And yeah allowing MIGRATION to be set globally, does require for the MIGRATION SRC and DST be explicitly set to DEFAULT and FILE, respectively. Let me remove this section altogether, and update the next one to emphasise that switches are currently only supported at Table or CF level config.
   
   A migration tool is highly desired, and it was on our plan to work on it once an initial version of this feature get released. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #3942: HBASE-26265 Update ref guide to mention the new store file tracker im…

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #3942:
URL: https://github.com/apache/hbase/pull/3942#issuecomment-993056843


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m  9s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  4s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ HBASE-26067 Compile Tests _ |
   ||| _ Patch Compile Tests _ |
   ||| _ Other Tests _ |
   |  |   |   2m 29s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3942/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/3942 |
   | Optional Tests |  |
   | uname | Linux c5b80543e958 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | HBASE-26067 / 4aa3f47aa2 |
   | Max. process+thread count | 47 (vs. ulimit of 30000) |
   | modules | C: . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3942/1/console |
   | versions | git=2.17.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #3942: HBASE-26265 Update ref guide to mention the new store file tracker im…

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #3942:
URL: https://github.com/apache/hbase/pull/3942#issuecomment-993063980


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 26s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ HBASE-26067 Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m  8s |  HBASE-26067 passed  |
   | +0 :ok: |  refguide  |   4m  0s |  branch has no errors when building the reference guide. See footer for rendered docs, which you should manually inspect.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 47s |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +0 :ok: |  refguide  |   3m 30s |  patch has no errors when building the reference guide. See footer for rendered docs, which you should manually inspect.  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 18s |  The patch does not generate ASF License warnings.  |
   |  |   |  17m 56s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3942/1/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/3942 |
   | Optional Tests | dupname asflicense refguide |
   | uname | Linux 74fdfe4eef6a 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | HBASE-26067 / 4aa3f47aa2 |
   | refguide | https://nightlies.apache.org/hbase/HBase/HBase-PreCommit-GitHub-PR/PR-3942/1/yetus-general-check/output/branch-site/book.html |
   | refguide | https://nightlies.apache.org/hbase/HBase/HBase-PreCommit-GitHub-PR/PR-3942/1/yetus-general-check/output/patch-site/book.html |
   | Max. process+thread count | 78 (vs. ulimit of 30000) |
   | modules | C: . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3942/1/console |
   | versions | git=2.17.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache9 commented on a change in pull request #3942: HBASE-26265 Update ref guide to mention the new store file tracker im…

Posted by GitBox <gi...@apache.org>.
Apache9 commented on a change in pull request #3942:
URL: https://github.com/apache/hbase/pull/3942#discussion_r770275306



##########
File path: src/main/asciidoc/_chapters/store_file_tracking.adoc
##########
@@ -0,0 +1,175 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[[storefiletracking]]
+= Store File Tracking
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+== Overview
+
+This feature introduces an abstraction layer to track store files still used/needed by store
+engines, allowing for plugging different approaches of identifying store
+files required by the given store.
+
+Historically, HBase internals have relied on creating hfiles on temporary directories first, renaming
+those files to the actual store directory at operation commit time. That's a simple and convenient
+way to separate transient from already finalised files that are ready to serve client reads with data.
+This approach works well with strong consistent file systems, but with the popularity of less consistent
+file systems, mainly Object Store file systems, dependency on rename operations starts to introduce
+performance penalties. Amazon S3 Object Store, in particular, has been the most affected deployment,
+due to the its lack of atomic renames, requiring an additional locking layer implemented by HBOSS,
+to guarantee consistency and integrity of operations.
+
+With *Store File Tracking*, decision on where to originally create new hfiles and how to proceed upon
+commit is delegated to the specific Store File Tracking implementation.
+It can be set at individual Table or Column Family configurations, as well as in processes
+*hbase-site.xml* configuration file.
+
+NOTE: When specified in *hbase_site.xml*, this configuration is also saved into tables configuration
+at table creation time. This is to avoid dangerous configuration mismatches between processes, which
+could potentially lead to data loss.
+
+== Available Implementations
+
+Store File Tracking initial version provides three builtin implementations:
+
+* DEFAULT
+* FILE
+* MIGRATION
+
+### DEFAULT
+
+As per the name, this is the Store File Tracking implementation used by default when now explicit
+configuration has been defined. The DEFAULT tracker implements the standard approach using temporary
+directories and renames.
+
+### FILE
+
+A file tracker implementation that creates new files straight in the store directory, avoiding the
+need for rename operations. It keeps a list of committed hfiles in memory, backed by meta files, in
+each store directory. Whenever a new hfile is committed, the list of _tracked files_ in the given
+store is updated and a new meta file is written with this list contents, discarding the previous
+meta file now containing an out dated list.
+
+### MIGRATION
+
+A special implementation to be used when swapping between Store File Tracking implementations on
+pre-existing tables that already contain data, and therefore, files being tracked under an specific
+logic.
+
+== Usage
+
+For fresh deployments that don't yet contain any user data, *FILE* implementation can be just set as
+value for *hbase.store.file-tracker.impl* property in global *hbase-site.xml* configuration, prior
+to the first hbase start. Omitting this property sets the *DEFAULT* implementation.
+
+### Switching implementations globally

Review comment:
       Oh, looking at the code, MigrateStoreFileTrackerProcedure will not always set the SFT impl to default, if we set a MIGRATION store file tracker globally, RollingUpgradeChore will set the SFT implementation to MIGRATION.
   
   But this does not work actually, the trick here is that we will not actually reopen all the regions in MigrateStoreFileTrackerProcedure, because we think that we does not change anything actually. So this should be a bug, we should not set the SFT implementation to anything other than DEFAULT...
   
   So for me, first, we could implement a special admin API to change the SFT implementation, to hide the intermediate MIGRATION state. This can be done with a special procedure which schedule two ModifyTableProcedure as sub procedures. And we could also implement a special migration tool, for migrating all the tables to specific SFT implementation after upgrading. I think this is a commom requirements for our users who deploy HBase on top of S3.
   
   But anyway, I do not think the migration way described here can fully work, and it is very dangrous which may cause data loss(using a wrong SFT implementation will likely to cause data loss...)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache9 commented on a change in pull request #3942: HBASE-26265 Update ref guide to mention the new store file tracker im…

Posted by GitBox <gi...@apache.org>.
Apache9 commented on a change in pull request #3942:
URL: https://github.com/apache/hbase/pull/3942#discussion_r768775708



##########
File path: src/main/asciidoc/_chapters/store_file_tracking.adoc
##########
@@ -0,0 +1,175 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[[storefiletracking]]
+= Store File Tracking
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+== Overview
+
+This feature introduces an abstraction layer to track store files still used/needed by store
+engines, allowing for plugging different approaches of identifying store
+files required by the given store.
+
+Historically, HBase internals have relied on creating hfiles on temporary directories first, renaming
+those files to the actual store directory at operation commit time. That's a simple and convenient
+way to separate transient from already finalised files that are ready to serve client reads with data.
+This approach works well with strong consistent file systems, but with the popularity of less consistent
+file systems, mainly Object Store file systems, dependency on rename operations starts to introduce
+performance penalties. Amazon S3 Object Store, in particular, has been the most affected deployment,
+due to the its lack of atomic renames, requiring an additional locking layer implemented by HBOSS,
+to guarantee consistency and integrity of operations.
+
+With *Store File Tracking*, decision on where to originally create new hfiles and how to proceed upon
+commit is delegated to the specific Store File Tracking implementation.
+It can be set at individual Table or Column Family configurations, as well as in processes
+*hbase-site.xml* configuration file.
+
+NOTE: When specified in *hbase_site.xml*, this configuration is also saved into tables configuration
+at table creation time. This is to avoid dangerous configuration mismatches between processes, which
+could potentially lead to data loss.
+
+== Available Implementations
+
+Store File Tracking initial version provides three builtin implementations:
+
+* DEFAULT
+* FILE
+* MIGRATION
+
+### DEFAULT
+
+As per the name, this is the Store File Tracking implementation used by default when now explicit
+configuration has been defined. The DEFAULT tracker implements the standard approach using temporary
+directories and renames.
+
+### FILE
+
+A file tracker implementation that creates new files straight in the store directory, avoiding the
+need for rename operations. It keeps a list of committed hfiles in memory, backed by meta files, in
+each store directory. Whenever a new hfile is committed, the list of _tracked files_ in the given
+store is updated and a new meta file is written with this list contents, discarding the previous
+meta file now containing an out dated list.
+
+### MIGRATION
+
+A special implementation to be used when swapping between Store File Tracking implementations on
+pre-existing tables that already contain data, and therefore, files being tracked under an specific
+logic.
+
+== Usage
+
+For fresh deployments that don't yet contain any user data, *FILE* implementation can be just set as
+value for *hbase.store.file-tracker.impl* property in global *hbase-site.xml* configuration, prior
+to the first hbase start. Omitting this property sets the *DEFAULT* implementation.
+
+### Switching implementations globally

Review comment:
       Have you tried this operation? I do not think it works in this way...
   
   The global config will only effect new tables. So I think here you need to alter the tables one by one...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] wchevreuil commented on pull request #3942: HBASE-26265 Update ref guide to mention the new store file tracker im…

Posted by GitBox <gi...@apache.org>.
wchevreuil commented on pull request #3942:
URL: https://github.com/apache/hbase/pull/3942#issuecomment-994100121


   Thanks for the review comments, @joshelser and @Apache9 . I believe I have addressed all the suggestions, please give a second look at your convenience.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #3942: HBASE-26265 Update ref guide to mention the new store file tracker im…

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #3942:
URL: https://github.com/apache/hbase/pull/3942#issuecomment-996062878


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 27s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ HBASE-26067 Compile Tests _ |
   ||| _ Patch Compile Tests _ |
   ||| _ Other Tests _ |
   |  |   |   1m 25s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3942/3/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/3942 |
   | Optional Tests |  |
   | uname | Linux b37a2f259a19 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | HBASE-26067 / 81c9b8793e |
   | Max. process+thread count | 54 (vs. ulimit of 30000) |
   | modules | C: . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3942/3/console |
   | versions | git=2.17.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] joshelser merged pull request #3942: HBASE-26265 Update ref guide to mention the new store file tracker im…

Posted by GitBox <gi...@apache.org>.
joshelser merged pull request #3942:
URL: https://github.com/apache/hbase/pull/3942


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #3942: HBASE-26265 Update ref guide to mention the new store file tracker im…

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #3942:
URL: https://github.com/apache/hbase/pull/3942#issuecomment-996080995


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m 32s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ HBASE-26067 Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   6m  3s |  HBASE-26067 passed  |
   | +0 :ok: |  refguide  |   5m 45s |  branch has no errors when building the reference guide. See footer for rendered docs, which you should manually inspect.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   6m  1s |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +0 :ok: |  refguide  |   5m 17s |  patch has no errors when building the reference guide. See footer for rendered docs, which you should manually inspect.  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 22s |  The patch does not generate ASF License warnings.  |
   |  |   |  26m 45s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3942/3/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/3942 |
   | Optional Tests | dupname asflicense refguide |
   | uname | Linux 3ea6e14b5641 4.15.0-163-generic #171-Ubuntu SMP Fri Nov 5 11:55:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | HBASE-26067 / 81c9b8793e |
   | refguide | https://nightlies.apache.org/hbase/HBase/HBase-PreCommit-GitHub-PR/PR-3942/3/yetus-general-check/output/branch-site/book.html |
   | refguide | https://nightlies.apache.org/hbase/HBase/HBase-PreCommit-GitHub-PR/PR-3942/3/yetus-general-check/output/patch-site/book.html |
   | Max. process+thread count | 63 (vs. ulimit of 30000) |
   | modules | C: . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3942/3/console |
   | versions | git=2.17.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org