You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by xu...@apache.org on 2022/10/27 18:59:13 UTC

[hudi] branch asf-site updated: [HUDI-1570] Add "average record size in a commit" to FAQ (#7072)

This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 84bc8cd60f [HUDI-1570] Add "average record size in a commit" to FAQ (#7072)
84bc8cd60f is described below

commit 84bc8cd60f1540eaf77262df0f6065c73d96fc68
Author: Jon Vexler <jb...@gmail.com>
AuthorDate: Thu Oct 27 11:59:07 2022 -0700

    [HUDI-1570] Add "average record size in a commit" to FAQ (#7072)
---
 website/docs/faq.md                          | 5 +++++
 website/versioned_docs/version-0.12.1/faq.md | 5 +++++
 2 files changed, 10 insertions(+)

diff --git a/website/docs/faq.md b/website/docs/faq.md
index 793b459c8d..ae24d86ecc 100644
--- a/website/docs/faq.md
+++ b/website/docs/faq.md
@@ -632,6 +632,11 @@ Cloudera CDP stack, causing the conflict.  To get around the RuntimeException, y
 `hbase.defaults.for.version.skip` to `true` in the `hbase-site.xml` configuration file, e.g., overwriting the config
 within the Cloudera manager.
 
+### How can I find the average record size in a commit?
+The `commit showpartitons` command in [HUDI CLI](https://hudi.apache.org/docs/cli) will show both "bytes written" and 
+"records inserted." Divide the bytes written by records inserted to find the average size. Note that this answer assumes 
+metadata overhead is negligible. For a small dataset (such as 5 columns, 100 records) this will not be the case.
+
 ## Contributing to FAQ
 
 A good and usable FAQ should be community-driven and crowd source questions/thoughts across everyone.
diff --git a/website/versioned_docs/version-0.12.1/faq.md b/website/versioned_docs/version-0.12.1/faq.md
index ac8c2aec6e..65dee20ce1 100644
--- a/website/versioned_docs/version-0.12.1/faq.md
+++ b/website/versioned_docs/version-0.12.1/faq.md
@@ -627,6 +627,11 @@ Cloudera CDP stack, causing the conflict.  To get around the RuntimeException, y
 `hbase.defaults.for.version.skip` to `true` in the `hbase-site.xml` configuration file, e.g., overwriting the config
 within the Cloudera manager.
 
+### How can I find the average record size in a commit?
+The `commit showpartitons` command in [HUDI CLI](https://hudi.apache.org/docs/cli) will show both "bytes written" and
+"records inserted." Divide the bytes written by records inserted to find the average size. Note that this answer assumes
+metadata overhead is negligible. For a small dataset (such as 5 columns, 100 records) this will not be the case.
+
 ## Contributing to FAQ
 
 A good and usable FAQ should be community-driven and crowd source questions/thoughts across everyone.