You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Jean-Daniel Cryans (Code Review)" <ge...@cloudera.org> on 2017/07/25 21:59:46 UTC

[kudu-CR] [docs] Bump the data size and tablets limits

Hello Adar Dembo, Todd Lipcon,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/7503

to review the following change.

Change subject: [docs] Bump the data size and tablets limits
......................................................................

[docs] Bump the data size and tablets limits

Some of the changes that landed in 1.4.0, namely Todd's memory consumption
and log segments improvements, plus the beginning of Adar's thread consolidation
effort, make it so that it's easier for Kudu to store more data per node.

Some notes:
 - Memory consumption now seems to be around 1.5GB / TB of data on disk after
   startup for a TPC-H lineitem table.
 - File descriptor consumption is about 2 per log segment plus 1 per log index.
   Tablets with some replication lag will use more segments. To that is added
   the fd cache that defaults to 40% of the configured max fds.
 - Thread usage is about 5 for hot replicas, then 2 when they become cold (new
   1.4.0 concept that Todd added).

Based on the above, doubling our current limitations of 4TB spread over 1000
tablets to 8TB spread over 2000 means that:
 - 8TB requires at least 12GB of memory, then some more for the MRS, block cache,
   and scanners (around 256KB per column per scan).
 - 6000 fds are required to spin up 2000 tablets, plus what the fd cache uses.
 - 10k threads are required to just to start Kudu.

Doubling then seems "safe". Maybe we should also recommend something different
for the amount of RAM necessary? Currently we say that Kudu needs at least 4GB
but ideally more than 10GB.

Change-Id: Ie60d2c3548c402c6a08db9bb724bc6367db989ca
---
M docs/known_issues.adoc
1 file changed, 2 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/03/7503/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie60d2c3548c402c6a08db9bb724bc6367db989ca
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [docs] Bump the data size and tablets limits

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged.

Change subject: [docs] Bump the data size and tablets limits
......................................................................


[docs] Bump the data size and tablets limits

Some of the changes that landed in 1.4.0, namely Todd's memory consumption
and log segments improvements, plus the beginning of Adar's thread consolidation
effort, make it so that it's easier for Kudu to store more data per node.

Some notes (mostly coming from Adar):
 - Memory consumption now seems to be around 1.5GB / TB of data on disk after
   startup for a TPC-H lineitem table.
 - File descriptor consumption is about 2 per log segment plus 1 per log index.
   Tablets with some replication lag will use more segments. To that is added
   the fd cache that defaults to 40% of the configured max fds.
 - Thread usage is about 5 for hot replicas, then 2 when they become cold (new
   1.4.0 concept that Todd added).

Based on the above, doubling our current limitations of 4TB spread over 1000
tablets to 8TB spread over 2000 means that:
 - 8TB requires at least 12GB of memory, then some more for the MRS, block cache,
   and scanners (around 256KB per column per scan).
 - 6000 fds are required to spin up 2000 tablets, plus what the fd cache uses.
 - 10k threads are required to just to start Kudu.

Change-Id: Ie60d2c3548c402c6a08db9bb724bc6367db989ca
Reviewed-on: http://gerrit.cloudera.org:8080/7503
Reviewed-by: Todd Lipcon <to...@apache.org>
Tested-by: Todd Lipcon <to...@apache.org>
---
M docs/known_issues.adoc
1 file changed, 3 insertions(+), 3 deletions(-)

Approvals:
  Todd Lipcon: Looks good to me, approved; Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/7503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ie60d2c3548c402c6a08db9bb724bc6367db989ca
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [docs] Bump the data size and tablets limits

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: [docs] Bump the data size and tablets limits
......................................................................


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie60d2c3548c402c6a08db9bb724bc6367db989ca
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] [docs] Bump the data size and tablets limits

Posted by "Jean-Daniel Cryans (Code Review)" <ge...@cloudera.org>.
Hello Adar Dembo, Todd Lipcon, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7503

to look at the new patch set (#2).

Change subject: [docs] Bump the data size and tablets limits
......................................................................

[docs] Bump the data size and tablets limits

Some of the changes that landed in 1.4.0, namely Todd's memory consumption
and log segments improvements, plus the beginning of Adar's thread consolidation
effort, make it so that it's easier for Kudu to store more data per node.

Some notes (mostly coming from Adar):
 - Memory consumption now seems to be around 1.5GB / TB of data on disk after
   startup for a TPC-H lineitem table.
 - File descriptor consumption is about 2 per log segment plus 1 per log index.
   Tablets with some replication lag will use more segments. To that is added
   the fd cache that defaults to 40% of the configured max fds.
 - Thread usage is about 5 for hot replicas, then 2 when they become cold (new
   1.4.0 concept that Todd added).

Based on the above, doubling our current limitations of 4TB spread over 1000
tablets to 8TB spread over 2000 means that:
 - 8TB requires at least 12GB of memory, then some more for the MRS, block cache,
   and scanners (around 256KB per column per scan).
 - 6000 fds are required to spin up 2000 tablets, plus what the fd cache uses.
 - 10k threads are required to just to start Kudu.

Change-Id: Ie60d2c3548c402c6a08db9bb724bc6367db989ca
---
M docs/known_issues.adoc
1 file changed, 3 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/03/7503/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie60d2c3548c402c6a08db9bb724bc6367db989ca
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [docs] Bump the data size and tablets limits

Posted by "Jean-Daniel Cryans (Code Review)" <ge...@cloudera.org>.
Jean-Daniel Cryans has posted comments on this change.

Change subject: [docs] Bump the data size and tablets limits
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7503/1//COMMIT_MSG
Commit Message:

PS1, Line 29: Doubling then seems "safe". Maybe we should also recommend something different
            : for the amount of RAM necessary? Currently we say that Kudu needs at least 4GB
            : but ideally more than 10GB.
> yea, I think adding just a note in that "scale limitations" section that sa
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie60d2c3548c402c6a08db9bb724bc6367db989ca
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] [docs] Bump the data size and tablets limits

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: [docs] Bump the data size and tablets limits
......................................................................


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie60d2c3548c402c6a08db9bb724bc6367db989ca
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] [docs] Bump the data size and tablets limits

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: [docs] Bump the data size and tablets limits
......................................................................


Patch Set 1: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie60d2c3548c402c6a08db9bb724bc6367db989ca
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] [docs] Bump the data size and tablets limits

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: [docs] Bump the data size and tablets limits
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7503/1//COMMIT_MSG
Commit Message:

PS1, Line 29: Doubling then seems "safe". Maybe we should also recommend something different
            : for the amount of RAM necessary? Currently we say that Kudu needs at least 4GB
            : but ideally more than 10GB.
yea, I think adding just a note in that "scale limitations" section that says something like: if you are approaching these limits it is necessary to ensure that the tablet server is given at least 16G of RAM?


-- 
To view, visit http://gerrit.cloudera.org:8080/7503
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie60d2c3548c402c6a08db9bb724bc6367db989ca
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes