You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2016/09/14 06:20:36 UTC

[kudu-CR] [benchmarks] introduced insert loadgen tool

Alexey Serbin has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4412

Change subject: [benchmarks] introduced insert_loadgen tool
......................................................................

[benchmarks] introduced insert_loadgen tool

A new simple load generation tool which allows to measure performance
of the Kudu C++ client library in MANUAL_FLUSH and AUTO_FLUSH_BACKGROUND
modes for 'push-as-much-as-you-can' scenarios: the client generates
and pushes as much data to tablet servers as it can given the
parameters for buffering/batching.

Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
---
M src/kudu/benchmarks/CMakeLists.txt
A src/kudu/benchmarks/insert_loadgen.cc
2 files changed, 414 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4412/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4412

to look at the new patch set (#18).

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................

[tools] added insert-generated-rows into kudu tools

The insert-generated-rows tool has been merged into the 'kudu'
umbrella toolset.

This addresses KUDU-1628.

Besides, added ability to run multiple inserter threads and
specify additional parameters on batching behavior of the generated
write operations.  It's possible to run data generating sessions
both in MANUAL_FLUSH and AUTO_FLUSH_BACKGROUND modes.  Introduced
sequential and random modes for the data generator.  Overall, these
changes allow to use the tool to measure performance of the Kudu C++
client library in simplistic 'push-as-much-as-you-can' scenario:
the client generates and sends data as fast as it can.

Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
---
M src/kudu/tools/CMakeLists.txt
D src/kudu/tools/insert-generated-rows.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action.cc
M src/kudu/tools/tool_action.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_fs.cc
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_master.cc
M src/kudu/tools/tool_action_pbc.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tools/tool_action_table.cc
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_action_tserver.cc
M src/kudu/tools/tool_action_wal.cc
M src/kudu/tools/tool_main.cc
17 files changed, 714 insertions(+), 183 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4412/18
-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 18
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................


Patch Set 12:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/4412/11/src/kudu/tools/tool_action_test.cc
File src/kudu/tools/tool_action_test.cc:

PS11, Line 358:   }
              :   if (total_row_count != nullptr) {
              :     *total_row_count = accumulate(row_count.begin(), row_count.end(), 0UL);
              :   }
> Looks like it's still here?
Oops, my bad.  I'm surprised myself -- probably, I got distracted.  Let me remove it, though.


http://gerrit.cloudera.org:8080/#/c/4412/12/src/kudu/tools/tool_action_test.cc
File src/kudu/tools/tool_action_test.cc:

PS12, Line 21: options
> s/options/option ?
Done


Line 33: //     --num_threads=1 \
> don't you need master_addresses here as required params ? also applicable i
Good catch -- these examples became stale once I switched to the mandatory master(s) location.  Will update


PS12, Line 75: master server(s) location
> as per the code below there is no default_params for master server location
Yes, correct -- since changing to use master addresses as mandatory parameter, this no longer has default value.  Will update.


Line 80: //     bench_c_00_03
> Same as above. Also here and above, we could keep this name intuitive to re
Done


Line 119: using kudu::Schema;
> I think Schema is not being used anywhere.
Done


Line 131: using kudu::client::sp::shared_ptr;
> Actually a basic Qn here: why do we use this special shared_ptr  as opposed
It's just a typedef: it might be either std::shared_ptr or std::tr1::shared_ptr, check $REPO_ROOT/src/kudu/client/shared_ptr.h  That's because Kudu C++ client API uses this notation to be compatible with C++03.


PS12, Line 142: 1000000
> Nit: Not sure if you want to insert 1 million by default, if I am a curious
1M rows are inserted really fast: at my laptop it's just 5 seconds.  Also, the idea was to create some _load_ :)  With 1K rows there is no load at all.  Besides, take into the consideration that the original insert-generated-rows tools was running with no limit at generated rows at all.

So, what do you think would be the right number here?


Line 194:   int64_t NextImpl() {
> Do we want to use uint64_t here ? Although Random.Next64() seems to be fill
OK, that makes sense.  Will update to uint64_t.


PS12, Line 215: != 0
> this check is prolly redundant ?
As I remember, Tidy Bot thinks it's better for readability: that's the reason I left it here.  I'll try to get rid of it -- let's see what it will say.


PS12, Line 285: case UNKNOWN_DATA:  // fall-through
> redundant ?
Yes, it seems so.  I copied the switch enumerators from some place and left this UNKNOWN_DATA as is.  Will remove.


Line 302:              FLAGS_buffer_flush_watermark_pct));
> I vaguely recall cpp guideline saying next spill being after 4 spaces I gue
QtCreator's editor does this sometimes.  Will fix.


PS12, Line 314: FLAGS_num_rows_per_thread
> consider using variable FLAGS_ directly to be consistent with other places.
I wanted to have a local variable since I access it in multiple places.


PS12, Line 315: num_rows_per_thread == 0
> given it's default value being 1M, what's this check for ?
Take a look at the comment for the '--num_rows_per_thread': 0 means no limit.


Line 324:   if (row_count != nullptr) {
> DCHECK or CHECK for this, unless we have a use case where it could actually
I don't see why DCHECK or CHECK is needed here.  One can pass nullptr as rowcount: that means there isn't any interest in getting the count.  It's a ubiquitous notation for pointer out parameters, which I want to have here, even if nobody uses it right now.


Line 403:   Stopwatch sw(Stopwatch::ALL_THREADS);
> Stupid Qn: Does ALL_THREADS here mean all threads on CPU or all threads gen
Threads in this process, so the monitoring/main thread will also be counted in.

I don't want LOG_TIMING, since I want to output that information in my own format, not what LOG_TIMING provides.


PS12, Line 440: optional scan afterwards
> I think this need not be made explicit since this is listed in optional par
I want this to be present to emphasize that there are two parts: the write and the read one, even if the latter is optional.  That's an essence of this test in a terse phrasing, otherwise it would be necessary to read through all the list of options to understand about the read part.


PS12, Line 452: kTableNameArg
> Can we not make this table_name optional and create one if the user doesn't
The original approach was to be able to use any table structure, not only some fixed pre-generated one which the test would create.  So, this test retrieves the table schema out of existing table it's pointed at.  That's because specifying custom table structure via parameters of this test would be cumbersome.
  
We can add this an option, with pairing auto-delete option. In case of auto-created table we can safely delete it if nobody asks to leave it alone after the test is completed.


PS12, Line 455:   
> Nit: Double space here.
Yes, here and elsewhere in the description for parameters: this is intentional (that's easier for reading, IMO).  Or we have some sort of style guide for having one space for those?


Line 457:           "the inserted rows." })
> Ideally, we would not want this behavior of test leaving the test data on t
I'm strongly opposed to auto-deleting any data by the test in its current implementation, since this test might be run in production environment.  Nothing prevents a user pointing to already existing table with some data, and I don't want to delete any useful data along with the data generated by the test.

To me, using the approach you are advocating makes more sense in case of auto-created tables, because in that case we can have a guarantee the table was created by the test and not existed before.  I think we can use auto-generated table names for that and be safe.  I'll take a look at that approach.


-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Dinesh Bhat (Code Review)" <ge...@cloudera.org>.
Dinesh Bhat has posted comments on this change.

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................


Patch Set 13:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4412/12/src/kudu/tools/tool_action_test.cc
File src/kudu/tools/tool_action_test.cc:

PS12, Line 452: t of master a
> The original approach was to be able to use any table structure, not only s
Yeah, my comment at L457 was kinda related to this. i.e, I am not sure if we need to be flexible wrt schemas - we can have one static schema table and load generate and delete the table at the end of the test with an option to keep it if we want to append rows later. Basically when it comes to tests, one default approach could be 'leave_no_trace' philosophy. Imagine if I run this tool continuously for 100 times without exposing a button to delete the data generated from these tests, which means we are growing the data in unbounded fashion.


PS12, Line 455: et
> Yes, here and elsewhere in the description for parameters: this is intentio
Other tools seem to have followed just one space, so was trying to be consistent with them.


-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................


Patch Set 18: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 18
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] [benchmarks] introduced insert loadgen tool

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: [benchmarks] introduced insert_loadgen tool
......................................................................


Patch Set 3:

Build Started http://104.196.14.100/job/kudu-gerrit/3421/

-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No

[kudu-CR] [tools] updated insert-generated-rows tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4412

to look at the new patch set (#6).

Change subject: [tools] updated insert-generated-rows tool
......................................................................

[tools] updated insert-generated-rows tool

Added ability to run multiple inserter threads and specify different
parameters on batching/buffering of the generated write operations.
Now it's possible to run the data generating session both in
MANUAL_FLUSH and AUTO_FLUSH_BACKGROUND modes.  Also, introduced
different modes for the row generator.  Overall, the changes allow
allow to use the tool to measure performance of the Kudu C++ client
library in 'push-as-much-as-you-can' scenarios: the client generates
and sends as much data to tablet servers as it can.

Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
---
M src/kudu/tools/insert-generated-rows.cc
1 file changed, 394 insertions(+), 84 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4412/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [tools] updated insert-generated-rows tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4412

to look at the new patch set (#7).

Change subject: [tools] updated insert-generated-rows tool
......................................................................

[tools] updated insert-generated-rows tool

Added ability to run multiple inserter threads and specify different
parameters on batching/buffering of the generated write operations.
Now it's possible to run the data generating session both in
MANUAL_FLUSH and AUTO_FLUSH_BACKGROUND modes.  Also, introduced
different modes for the row generator.  Overall, the changes allow
allow to use the tool to measure performance of the Kudu C++ client
library in 'push-as-much-as-you-can' scenarios: the client generates
and sends as much data to tablet servers as it can.

Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
---
M src/kudu/tools/insert-generated-rows.cc
1 file changed, 394 insertions(+), 84 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4412/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4412

to look at the new patch set (#13).

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................

[tools] added insert-generated-rows into kudu tools

The insert-generated-rows tool has been merged into the 'kudu'
umbrella toolset.

This addresses KUDU-1628.

Besides, added ability to run multiple inserter threads and
specify additional parameters on batching behavior of the generated
write operations.  It's possible to run data generating sessions
both in MANUAL_FLUSH and AUTO_FLUSH_BACKGROUND modes.  Introduced
sequential and random modes for the data generator.  Overall, these
changes allow to use the tool to measure performance of the Kudu C++
client library in simplistic 'push-as-much-as-you-can' scenario:
the client generates and sends data as fast as it can.

Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
---
M src/kudu/tools/CMakeLists.txt
D src/kudu/tools/insert-generated-rows.cc
A src/kudu/tools/kudu-test.cc
M src/kudu/tools/tool_action.cc
M src/kudu/tools/tool_action.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_fs.cc
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_master.cc
M src/kudu/tools/tool_action_pbc.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tools/tool_action_table.cc
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_action_tserver.cc
M src/kudu/tools/tool_action_wal.cc
M src/kudu/tools/tool_main.cc
17 files changed, 705 insertions(+), 179 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4412/13
-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................


Patch Set 21:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4412/21/src/kudu/tools/tool_action_test.cc
File src/kudu/tools/tool_action_test.cc:

PS21, Line 413:  = Status::OK();
> Nit: this is the default value, can omit.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 21
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................


Patch Set 11:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4412/11/src/kudu/tools/tool_action_test.cc
File src/kudu/tools/tool_action_test.cc:

PS11, Line 128: DEFINE_string(test_master_addresses, "localhost:7051",
              :               "Comma separated list of master addresses to run against; "
              :               "addresses are 'hostname:port' form, where port may be omitted "
              :               "if a master server listens at the default port.");
> Certainly, it's better to keep semantics and usage of the master_addresses 
Maybe:
1) As part of this commit, make it mandatory (to be consistent with the rest).
2) As a follow-up, if you're interested, change them all.


Line 494:       .Description(
> You mean ExtraDescription()?  OK, will do.  I'm also thinking to add exampl
Yeah, I think a couple examples aren't a bad idea. Just run "kudu test loadgen --help" when you're done to make sure it looks OK.


-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] [benchmarks] introduced insert loadgen tool

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: [benchmarks] introduced insert_loadgen tool
......................................................................


Patch Set 1:

Build Started http://104.196.14.100/job/kudu-gerrit/3419/

-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No

[kudu-CR] [tools] updated insert-generated-rows tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4412

to look at the new patch set (#8).

Change subject: [tools] updated insert-generated-rows tool
......................................................................

[tools] updated insert-generated-rows tool

Added ability to run multiple inserter threads and specify additional
parameters on batching behavior of the generated write operations.
Now it's possible to run data generating sessions both in MANUAL_FLUSH
and AUTO_FLUSH_BACKGROUND modes.  Also, introduced sequential and
random modes for the data generator.  Overall, the change allow to use
the tool to measure performance of the Kudu C++ client library
in simplistic 'push-as-much-as-you-can' scenario: the client generates
and sends data as fast as it can.

Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
---
M src/kudu/tools/insert-generated-rows.cc
1 file changed, 394 insertions(+), 84 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4412/8
-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [benchmarks] introduced insert loadgen tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: [benchmarks] introduced insert_loadgen tool
......................................................................


Patch Set 3:

> havent looked at the code yet, but from the description at the top
 > of the file this looks pretty similar to "insert-generated-rows".
 > Could we consolidate the functionality here?

Yes, sure.  I remember you mentioned that I was free to completely replace the insert-generated-rows, but it seemed to me that people might prefer to keep something some functionality from it.  E.g., unlimited number of rows to insert.

Will do.

-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4412

to look at the new patch set (#19).

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................

[tools] added insert-generated-rows into kudu tools

The insert-generated-rows tool has been merged into the 'kudu'
umbrella toolset.

This addresses KUDU-1628.

Besides, added ability to run multiple inserter threads and
specify additional parameters on batching behavior of the generated
write operations.  It's possible to run data generating sessions
both in MANUAL_FLUSH and AUTO_FLUSH_BACKGROUND modes.  Introduced
sequential and random modes for the data generator.  Overall, these
changes allow to use the tool to measure performance of the Kudu C++
client library in simplistic 'push-as-much-as-you-can' scenario:
the client generates and sends data as fast as it can.

Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
---
M src/kudu/tools/CMakeLists.txt
D src/kudu/tools/insert-generated-rows.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action.cc
M src/kudu/tools/tool_action.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_fs.cc
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_master.cc
M src/kudu/tools/tool_action_pbc.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tools/tool_action_table.cc
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_action_tserver.cc
M src/kudu/tools/tool_action_wal.cc
M src/kudu/tools/tool_main.cc
17 files changed, 722 insertions(+), 183 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4412/19
-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 19
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................


Patch Set 11:

(26 comments)

Will post updated code shortly.

http://gerrit.cloudera.org:8080/#/c/4412/11/src/kudu/tools/CMakeLists.txt
File src/kudu/tools/CMakeLists.txt:

Line 68:   tool_action_test.cc
> Nit: should come after tablet.
Done


http://gerrit.cloudera.org:8080/#/c/4412/11/src/kudu/tools/tool_action_test.cc
File src/kudu/tools/tool_action_test.cc:

> I'm sure a lot of my comments here are to old code (that is, to code that y
Sure, no problem.  I really appreciate your guidance and feedback! :)


PS11, Line 19: which allows to push data
> Nit: "which pushes data"
Done


PS11, Line 26: records
> Nit: let's use 'rows' instead of 'records'; they mean the same thing but we
Done


PS11, Line 128: DEFINE_string(test_master_addresses, "localhost:7051",
              :               "Comma separated list of master addresses to run against; "
              :               "addresses are 'hostname:port' form, where port may be omitted "
              :               "if a master server listens at the default port.");
> Hmm. The default value is user friendly, but it's inconsistent with how oth
Certainly, it's better to keep semantics and usage of the master_addresses consistent across different utilities.  I put this parameter as optional since 'master_address' in the original insert-generated-rows utility was optional and had the default value of 'localhost'.

What do you think is best: keep the approach when this parameter is mandatory or modify other utilities to make it optional (btw, Todd told me that we can break backward-compatibility of the insert-generated-rows tool, so let's don't consider backward-compatibility for the insert-generated-rows as a constraint here) ?


Line 132: DEFINE_int32(test_num_threads, 2,
> Maybe add here that each thread has its own session, so that it's easier to
Done


PS11, Line 138: "Size of the session's mutation buffer, in bytes."
> Nit: reword so that it's clear that this will be the size of each mutation 
Done


PS11, Line 155: Using the preset data allows to squeeze "
              :               "more performance from the code.
> It's not really clear what this means. Is it a recommendation to use the de
I was trying to say that client generates more data per second using a pre-set data string compared with generated strings of the same length if run with the same hardware configuration.   Probably, I'll put this explanation into the description of the parameter.


Line 247:     srandom(time(nullptr));
> Why do we need this? Aren't we already seeding each Random in its construct
Good catch -- this is a leftover from previous revisions.  Will remove.


Line 270:   vector<thread> threads_;
> InsertLoadgen is stateless apart from this, and the threads are created and
Good idea -- will do.


PS11, Line 275:   const Schema* schema(row->schema());
> Nit: this is only used on the next line; can we combine them?
Done


PS11, Line 330: KUDU_CHECK_OK
> And let's use CHECK_OK here.
Done


Line 332:       .default_admin_operation_timeout(MonoDelta::FromSeconds(8))
> Let's drop this override.
Done


PS11, Line 358:   Status s = session->Flush();
              :   if (!s.ok()) {
              :     cout << "Encountered errors: " << s.ToString() << endl;
              :   }
> Should this also be gated on flush_per_n_rows_ not being 0? Or is the idea 
Yep, since there is a report on the errors below, it makes sense to omit this for all session flush modes.  I will remove this.


PS11, Line 399: KUDU_RETURN_NOT_OK
> Nit: should use RETURN_NOT_OK() for all of these.
Done


PS11, Line 401:       .default_admin_operation_timeout(MonoDelta::FromSeconds(8))
> Why lower the timeout? The default value is 30s, which is much safer.
I copied it over from some example; no particular reason.  Will drop the override.


PS11, Line 410:     KUDU_RETURN_NOT_OK(scanner.NextBatch(&results));
> Let's take this opportunity to use the new NextBatch API instead.
Done


PS11, Line 427:   SleepFor(MonoDelta::FromMilliseconds(50));
> To be honest, I think I'd prefer the warning over the sleep. In fact, maybe
OK, that sounds reasonable.  I'll drop this.  There might be other warning messages like information on DNS resolution timings, so this particular warning does not look too weird after all.


PS11, Line 432: const char* const kTableNameArg = "table_name";
              : const char* const kTableNameDesc = "Name of an existing table to use "
              :     "for the test. The test will determine the structure of the table column "
              :     "schema and populate it accordingly.  It's higly recommended to use "
              :     "a dedicated table created just for testing purposes: the test does not "
              :     "delete the inserted rows.";
> Nit: seems like an odd placement for this, between two static functions. Wh
Done


PS11, Line 440: const string
> Maybe cref?
Yes, sure.  It looks like a copy-and-paste from other place.  Will fix it there as well.


Line 446:   InsertLoadgen bench(master_addrs, table_name,
> With InsertLoadgen having no external callers, it's worth reevaluating vari
Good idea -- that would simplify this a bit.  I had some other ideas for InsertLoadGen as a class, but since it's not used anywhere else, I think it's worth to reduce it to a number of functions.


PS11, Line 466:     ostringstream s;
              :     s << "Encountered " << total_err_count << " errors";
              :     return Status::RuntimeError(s.str());
> How about using strings::Substitute here instead?
Done


PS11, Line 479:       ostringstream s;
              :       s << "Row count mismatch: expected " << total_row_count
              :         << "; actual " << count;
              :       return Status::RuntimeError(s.str());
> Likewise here.
Done


Line 494:       .Description(
> Can you split this up using Description() and LongDescription()? See ksck f
You mean ExtraDescription()?  OK, will do.  I'm also thinking to add examples of usage there.  Does it make sense?


PS11, Line 497: higly
> highly
Done


Line 500:       .AddOptionalParameter("test_master_addresses")
> Nit: please sort this list alphabetically. Makes it easier to find paramete
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] [tools] updated insert-generated-rows tool

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: [tools] updated insert-generated-rows tool
......................................................................


Patch Set 6:

(1 comment)

Would be nice to see a small integration test that runs the tool against a cluster then verifies that it indeed inserted rows. Our CLI tools have a tendency to bit-rot because of the lack of tests; we're trying to change that with the new CLI tool.

On that note, what do you think of incorporating this as an action in the new CLI tool? You'll get most of main() for free, but it means this will ship to users. Do you think it'd a useful tool for them?

http://gerrit.cloudera.org:8080/#/c/4412/6/src/kudu/tools/insert-generated-rows.cc
File src/kudu/tools/insert-generated-rows.cc:

Line 116: DEFINE_string(master_address, "localhost:7051",
Nit: can we change this to master_addresses?


-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] [benchmarks] introduced insert loadgen tool

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: [benchmarks] introduced insert_loadgen tool
......................................................................


Patch Set 2:

Build Started http://104.196.14.100/job/kudu-gerrit/3420/

-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No

[kudu-CR] [tools] updated insert-generated-rows tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4412

to look at the new patch set (#5).

Change subject: [tools] updated insert-generated-rows tool
......................................................................

[tools] updated insert-generated-rows tool

Added ability to run multiple inserter threads and specify different
parameters on batching/buffering of the generated write operations.
Now it's possible to run the data generating session both in
MANUAL_FLUSH and AUTO_FLUSH_BACKGROUND modes.  Also, introduced
different modes for the row generator.  Overall, the changes allow
allow to use the tool to measure performance of the Kudu C++ client
library in 'push-as-much-as-you-can' scenarios: the client generates
and sends as much data to tablet servers as it can.

Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
---
M src/kudu/tools/insert-generated-rows.cc
1 file changed, 394 insertions(+), 84 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4412/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................


Patch Set 17:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4412/16/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

Line 180:   string tool_path_;
> Done
Can you also add a new section to TestModeHelp for the new action?


PS16, Line 788: ema(client::Kud
> Because otherwise it will be an error like
Sorry, I meant why do you need a vector<string>() wrapper?


http://gerrit.cloudera.org:8080/#/c/4412/17/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

PS17, Line 79: using std::copy;
             : using std::back_inserter;
Nit: back_inserter should go first.


-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 17
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4412

to look at the new patch set (#12).

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................

[tools] added insert-generated-rows into kudu tools

The insert-generated-rows tool has been merged into the 'kudu'
umbrella toolset.

This addresses KUDU-1628.

Besides, added ability to run multiple inserter threads and
specify additional parameters on batching behavior of the generated
write operations.  It's possible to run data generating sessions
both in MANUAL_FLUSH and AUTO_FLUSH_BACKGROUND modes.  Introduced
sequential and random modes for the data generator.  Overall, these
changes allow to use the tool to measure performance of the Kudu C++
client library in simplistic 'push-as-much-as-you-can' scenario:
the client generates and sends data as fast as it can.

Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
---
M src/kudu/tools/CMakeLists.txt
D src/kudu/tools/insert-generated-rows.cc
M src/kudu/tools/tool_action.cc
M src/kudu/tools/tool_action.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_fs.cc
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_master.cc
M src/kudu/tools/tool_action_pbc.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tools/tool_action_table.cc
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_action_tserver.cc
M src/kudu/tools/tool_action_wal.cc
M src/kudu/tools/tool_main.cc
16 files changed, 522 insertions(+), 178 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4412/12
-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................


Patch Set 16:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4412/16/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

Line 180: TEST_F(ToolTest, TestTopLevelHelp) {
> Can you update this test with the new mode and action?
Done


PS16, Line 781: client::sp::
> Nit: you added a "using" for this at the top of the file.
Done


PS16, Line 788: vector<string>(
> Why is this needed?
Because otherwise it will be an error like

Bad status: Invalid argument: Table partitioning must be specified using add_hash_partitions or set_range_partition_columns


PS16, Line 789: wait(true)
> Can omit, this is the default behavior.
Done


PS16, Line 798: ASSERT_OK(cluster->Start());
> You can move this into LoadgenCreateMiniCluster if you want. You could even
Good idea.  For some reason, I thought keeping separate functions would be better, but apparently there isn't any demand for that.  Will update.


PS16, Line 819: string("-table_name=") + kTableName
> Nit: Substitute("-table_name=%s", kTableName) is more idiomatic.
Done


Line 820:     "-buffer_flush_watermark_pct=0.125",
> Nit: can you use -- to prefix gflags, for consistency with existing tests?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................


Patch Set 16:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4412/16/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

Line 180: TEST_F(ToolTest, TestTopLevelHelp) {
Can you update this test with the new mode and action?


PS16, Line 781: client::sp::
Nit: you added a "using" for this at the top of the file.


PS16, Line 788: vector<string>(
Why is this needed?


PS16, Line 789: wait(true)
Can omit, this is the default behavior.


PS16, Line 798: ASSERT_OK(cluster->Start());
You can move this into LoadgenCreateMiniCluster if you want. You could even move LoadgenCreateTable() (enabled via an argument) and Subprocess::Call(), then call it RunLoadgenTest() or something. Basically make the TEST_F() functions themselves just thin wrappers that provide some additional arguments.


PS16, Line 819: string("-table_name=") + kTableName
Nit: Substitute("-table_name=%s", kTableName) is more idiomatic.


Line 820:     "-buffer_flush_watermark_pct=0.125",
Nit: can you use -- to prefix gflags, for consistency with existing tests?


-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4412

to look at the new patch set (#17).

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................

[tools] added insert-generated-rows into kudu tools

The insert-generated-rows tool has been merged into the 'kudu'
umbrella toolset.

This addresses KUDU-1628.

Besides, added ability to run multiple inserter threads and
specify additional parameters on batching behavior of the generated
write operations.  It's possible to run data generating sessions
both in MANUAL_FLUSH and AUTO_FLUSH_BACKGROUND modes.  Introduced
sequential and random modes for the data generator.  Overall, these
changes allow to use the tool to measure performance of the Kudu C++
client library in simplistic 'push-as-much-as-you-can' scenario:
the client generates and sends data as fast as it can.

Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
---
M src/kudu/tools/CMakeLists.txt
D src/kudu/tools/insert-generated-rows.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action.cc
M src/kudu/tools/tool_action.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_fs.cc
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_master.cc
M src/kudu/tools/tool_action_pbc.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tools/tool_action_table.cc
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_action_tserver.cc
M src/kudu/tools/tool_action_wal.cc
M src/kudu/tools/tool_main.cc
17 files changed, 708 insertions(+), 182 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4412/17
-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 17
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4412

to look at the new patch set (#15).

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................

[tools] added insert-generated-rows into kudu tools

The insert-generated-rows tool has been merged into the 'kudu'
umbrella toolset.

This addresses KUDU-1628.

Besides, added ability to run multiple inserter threads and
specify additional parameters on batching behavior of the generated
write operations.  It's possible to run data generating sessions
both in MANUAL_FLUSH and AUTO_FLUSH_BACKGROUND modes.  Introduced
sequential and random modes for the data generator.  Overall, these
changes allow to use the tool to measure performance of the Kudu C++
client library in simplistic 'push-as-much-as-you-can' scenario:
the client generates and sends data as fast as it can.

Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
---
M src/kudu/tools/CMakeLists.txt
D src/kudu/tools/insert-generated-rows.cc
A src/kudu/tools/kudu-test.cc
M src/kudu/tools/tool_action.cc
M src/kudu/tools/tool_action.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_fs.cc
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_master.cc
M src/kudu/tools/tool_action_pbc.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tools/tool_action_table.cc
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_action_tserver.cc
M src/kudu/tools/tool_action_wal.cc
M src/kudu/tools/tool_main.cc
17 files changed, 758 insertions(+), 179 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4412/15
-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: [tools] added insert-generated-rows into kudu tools

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4412

to look at the new patch set (#9).

Change subject: WIP: [tools] added insert-generated-rows into kudu tools
......................................................................

WIP: [tools] added insert-generated-rows into kudu tools

The insert-generated-rows tool has been merged into the 'kudu'
umbrella toolset.

Besides, added ability to run multiple inserter threads and
specify additional parameters on batching behavior of the generated
write operations.  It's possible to run data generating sessions
both in MANUAL_FLUSH and AUTO_FLUSH_BACKGROUND modes.  Introduced
sequential and random modes for the data generator.  Overall, these
changes allow to use the tool to measure performance of the Kudu C++
client library in simplistic 'push-as-much-as-you-can' scenario:
the client generates and sends data as fast as it can.

Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
---
M src/kudu/tools/CMakeLists.txt
D src/kudu/tools/insert-generated-rows.cc
M src/kudu/tools/tool_action.cc
M src/kudu/tools/tool_action.h
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_main.cc
6 files changed, 457 insertions(+), 138 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4412/9
-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [benchmarks] introduced insert loadgen tool

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: [benchmarks] introduced insert_loadgen tool
......................................................................


Patch Set 3:

havent looked at the code yet, but from the description at the top of the file this looks pretty similar to "insert-generated-rows". Could we consolidate the functionality here?

-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] [tools] updated insert-generated-rows tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4412

to look at the new patch set (#4).

Change subject: [tools] updated insert-generated-rows tool
......................................................................

[tools] updated insert-generated-rows tool

Added ability to run multiple inserter threads and specify different
parameters on batching/buffering of the generated write operations.
Now it's possible to run the data generating session both in
MANUAL_FLUSH and AUTO_FLUSH_BACKGROUND modes.  Also, introduced
different modes for the row generator.  Overall, the changes allow
allow to use the tool to measure performance of the Kudu C++ client
library in 'push-as-much-as-you-can' scenarios: the client generates
and sends as much data to tablet servers as it can.

Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
---
M src/kudu/tools/insert-generated-rows.cc
1 file changed, 396 insertions(+), 84 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4412/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................


Patch Set 15:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4412/15/src/kudu/tools/CMakeLists.txt
File src/kudu/tools/CMakeLists.txt:

PS15, Line 108: ADD_KUDU_TEST(kudu-test)
              : ADD_KUDU_TEST_DEPENDENCIES(kudu-test
              :   kudu)
Why not reuse kudu-tool-test, which is where all of the new CLI tests (apart from the ported ones in kudu-ts-cli-test and kudu-admin-test) live?


http://gerrit.cloudera.org:8080/#/c/4412/15/src/kudu/tools/kudu-test.cc
File src/kudu/tools/kudu-test.cc:

Line 37: 
These are nice, but can we at least verify that something was written with a scan? Can we verify an exact row count too?


http://gerrit.cloudera.org:8080/#/c/4412/15/src/kudu/tools/tool_action_test.cc
File src/kudu/tools/tool_action_test.cc:

Line 110: #include "kudu/util/random.h"
Nit: should precee stopwatch (surprised Tidy Bot didn't mention it?)


PS15, Line 152: options allows "
              :             "to keep
Nit: option retains


PS15, Line 156: is not in effect
Nit: has no effect


-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4412

to look at the new patch set (#22).

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................

[tools] added insert-generated-rows into kudu tools

The insert-generated-rows tool has been merged into the 'kudu'
umbrella toolset.

This addresses KUDU-1628.

Besides, added ability to run multiple inserter threads and
specify additional parameters on batching behavior of the generated
write operations.  It's possible to run data generating sessions
both in MANUAL_FLUSH and AUTO_FLUSH_BACKGROUND modes.  Introduced
sequential and random modes for the data generator.  Overall, these
changes allow to use the tool to measure performance of the Kudu C++
client library in simplistic 'push-as-much-as-you-can' scenario:
the client generates and sends data as fast as it can.

Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
---
M src/kudu/tools/CMakeLists.txt
D src/kudu/tools/insert-generated-rows.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action.cc
M src/kudu/tools/tool_action.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_fs.cc
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_master.cc
M src/kudu/tools/tool_action_pbc.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tools/tool_action_table.cc
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_action_tserver.cc
M src/kudu/tools/tool_action_wal.cc
M src/kudu/tools/tool_main.cc
17 files changed, 751 insertions(+), 183 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4412/22
-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 22
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4412

to look at the new patch set (#11).

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................

[tools] added insert-generated-rows into kudu tools

The insert-generated-rows tool has been merged into the 'kudu'
umbrella toolset.

This addresses KUDU-1628.

Besides, added ability to run multiple inserter threads and
specify additional parameters on batching behavior of the generated
write operations.  It's possible to run data generating sessions
both in MANUAL_FLUSH and AUTO_FLUSH_BACKGROUND modes.  Introduced
sequential and random modes for the data generator.  Overall, these
changes allow to use the tool to measure performance of the Kudu C++
client library in simplistic 'push-as-much-as-you-can' scenario:
the client generates and sends data as fast as it can.

Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
---
M src/kudu/tools/CMakeLists.txt
D src/kudu/tools/insert-generated-rows.cc
M src/kudu/tools/tool_action.cc
M src/kudu/tools/tool_action.h
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_main.cc
6 files changed, 525 insertions(+), 138 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4412/11
-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4412/15/src/kudu/tools/kudu-test.cc
File src/kudu/tools/kudu-test.cc:

Line 37: 
> The exact row count is verified by the utility itself if adding '-run_scan'
Ah, forgot about -run_scan. No, then this is fine.


-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................


Patch Set 12:

(1 comment)

Looks good, though see my previous comment (PS11) about adding a few tests.

http://gerrit.cloudera.org:8080/#/c/4412/11/src/kudu/tools/tool_action_test.cc
File src/kudu/tools/tool_action_test.cc:

PS11, Line 358:   }
              :   if (total_row_count != nullptr) {
              :     *total_row_count = accumulate(row_count.begin(), row_count.end(), 0UL);
              :   }
> Yep, since there is a report on the errors below, it makes sense to omit th
Looks like it's still here?


-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................


Patch Set 20:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4412/20/src/kudu/tools/tool_action_test.cc
File src/kudu/tools/tool_action_test.cc:

PS20, Line 401: supposed
Nit: "assumed", right?


PS20, Line 405:   // It's necessary to have read-what-you-write behavior here.  Since
              :   // tablet leader can change and there might be replica propagation delays,
              :   // set the snapshot to the latest observed one to get correct row count.
As JD discovered with ITClient, if you end up scanning from a replica that is behind the desired timestamp, it'll only wait a certain amount of time to catch up before timing out the scan. You may have to work around that here.

See KUDU-1656 for more details.


-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 20
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4412

to look at the new patch set (#14).

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................

[tools] added insert-generated-rows into kudu tools

The insert-generated-rows tool has been merged into the 'kudu'
umbrella toolset.

This addresses KUDU-1628.

Besides, added ability to run multiple inserter threads and
specify additional parameters on batching behavior of the generated
write operations.  It's possible to run data generating sessions
both in MANUAL_FLUSH and AUTO_FLUSH_BACKGROUND modes.  Introduced
sequential and random modes for the data generator.  Overall, these
changes allow to use the tool to measure performance of the Kudu C++
client library in simplistic 'push-as-much-as-you-can' scenario:
the client generates and sends data as fast as it can.

Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
---
M src/kudu/tools/CMakeLists.txt
D src/kudu/tools/insert-generated-rows.cc
A src/kudu/tools/kudu-test.cc
M src/kudu/tools/tool_action.cc
M src/kudu/tools/tool_action.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_fs.cc
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_master.cc
M src/kudu/tools/tool_action_pbc.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tools/tool_action_table.cc
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_action_tserver.cc
M src/kudu/tools/tool_action_wal.cc
M src/kudu/tools/tool_main.cc
17 files changed, 769 insertions(+), 179 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4412/14
-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: [tools] added insert-generated-rows into kudu tools

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4412

to look at the new patch set (#10).

Change subject: WIP: [tools] added insert-generated-rows into kudu tools
......................................................................

WIP: [tools] added insert-generated-rows into kudu tools

The insert-generated-rows tool has been merged into the 'kudu'
umbrella toolset.

This addresses KUDU-1628.

Besides, added ability to run multiple inserter threads and
specify additional parameters on batching behavior of the generated
write operations.  It's possible to run data generating sessions
both in MANUAL_FLUSH and AUTO_FLUSH_BACKGROUND modes.  Introduced
sequential and random modes for the data generator.  Overall, these
changes allow to use the tool to measure performance of the Kudu C++
client library in simplistic 'push-as-much-as-you-can' scenario:
the client generates and sends data as fast as it can.

Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
---
M src/kudu/tools/CMakeLists.txt
D src/kudu/tools/insert-generated-rows.cc
M src/kudu/tools/tool_action.cc
M src/kudu/tools/tool_action.h
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_main.cc
6 files changed, 456 insertions(+), 138 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4412/10
-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................


Patch Set 16:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4412/16/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

Line 180: TEST_F(ToolTest, TestTopLevelHelp) {
> Can you also add a new section to TestModeHelp for the new action?
Done


PS16, Line 788: vector<string>(
> Sorry, I meant why do you need a vector<string>() wrapper?
Ah, sorry for the misunderstanding.  That's not needed: good catch!


http://gerrit.cloudera.org:8080/#/c/4412/17/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

PS17, Line 79: using std::unique_ptr;
             : using std::vector;
> Nit: back_inserter should go first.
Sounds like a pun :)

Fixed.


-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................


Patch Set 11:

(26 comments)

Can you add a couple things to kudu-tool-test?
1. Modify the help tests to incorporate the new mode and action.
2. Add a quick test or two for the new action. You don't have to test every possible gflag value, but some end-to-end coverage would be nice, to prevent it from bitrotting the way insert-generated-rows could.

http://gerrit.cloudera.org:8080/#/c/4412/11/src/kudu/tools/CMakeLists.txt
File src/kudu/tools/CMakeLists.txt:

Line 68:   tool_action_test.cc
Nit: should come after tablet.


http://gerrit.cloudera.org:8080/#/c/4412/11/src/kudu/tools/tool_action_test.cc
File src/kudu/tools/tool_action_test.cc:

I'm sure a lot of my comments here are to old code (that is, to code that you didn't write). That's not completely fair, but one of the reasons I asked you to move this into the new CLI tool was so that we can take the opportunity to rewrite this code using the Kudu style du jour.

I guess what I'm saying is, thanks in advance for being patient. :)


PS11, Line 19: which allows to push data
Nit: "which pushes data"


PS11, Line 26: records
Nit: let's use 'rows' instead of 'records'; they mean the same thing but we can avoid some confusion by reusing terminology.


PS11, Line 128: DEFINE_string(test_master_addresses, "localhost:7051",
              :               "Comma separated list of master addresses to run against; "
              :               "addresses are 'hostname:port' form, where port may be omitted "
              :               "if a master server listens at the default port.");
Hmm. The default value is user friendly, but it's inconsistent with how other actions handle master_addresses. I'd prefer we keep them consistent, which means either making this a required parameter without a default value, or changing the others to provide default values (and accepting the backwards incompatibility).

What do you think?


Line 132: DEFINE_int32(test_num_threads, 2,
Maybe add here that each thread has its own session, so that it's easier to understand the effects of the session gflags.


PS11, Line 138: "Size of the session's mutation buffer, in bytes."
Nit: reword so that it's clear that this will be the size of each mutation buffer that the session has.


PS11, Line 155: Using the preset data allows to squeeze "
              :               "more performance from the code.
It's not really clear what this means. Is it a recommendation to use the default value of test_string_fixed?


Line 247:     srandom(time(nullptr));
Why do we need this? Aren't we already seeding each Random in its constructor?


Line 270:   vector<thread> threads_;
InsertLoadgen is stateless apart from this, and the threads are created and destroyed in the context of Run(). So can we make the entire class stateless?


PS11, Line 275:   const Schema* schema(row->schema());
Nit: this is only used on the next line; can we combine them?


PS11, Line 330: KUDU_CHECK_OK
And let's use CHECK_OK here.


Line 332:       .default_admin_operation_timeout(MonoDelta::FromSeconds(8))
Let's drop this override.


PS11, Line 358:   Status s = session->Flush();
              :   if (!s.ok()) {
              :     cout << "Encountered errors: " << s.ToString() << endl;
              :   }
Should this also be gated on flush_per_n_rows_ not being 0? Or is the idea to force AUTO_FLUSH_BACKGROUND to finish up everything that's been buffered? If so, should we at least ignore the return value in AUTO_FLUSH_BACKGROUND mode? Or is it still useful?


PS11, Line 399: KUDU_RETURN_NOT_OK
Nit: should use RETURN_NOT_OK() for all of these.


PS11, Line 401:       .default_admin_operation_timeout(MonoDelta::FromSeconds(8))
Why lower the timeout? The default value is 30s, which is much safer.


PS11, Line 410:     KUDU_RETURN_NOT_OK(scanner.NextBatch(&results));
Let's take this opportunity to use the new NextBatch API instead.


PS11, Line 427:   SleepFor(MonoDelta::FromMilliseconds(50));
To be honest, I think I'd prefer the warning over the sleep. In fact, maybe we should downgrade the warning to INFO? Or remove it altogether?

I don't think the underlying issue is something we're going to address, because I think it's good that KuduScanner::Close() doesn't block, and AFAICT the only way to "fix" this is for it to block so that destruction of the KuduClient can be deferred to when there are no outstanding RPCs.


PS11, Line 432: const char* const kTableNameArg = "table_name";
              : const char* const kTableNameDesc = "Name of an existing table to use "
              :     "for the test. The test will determine the structure of the table column "
              :     "schema and populate it accordingly.  It's higly recommended to use "
              :     "a dedicated table created just for testing purposes: the test does not "
              :     "delete the inserted rows.";
Nit: seems like an odd placement for this, between two static functions. Why not at the top of the anonymous namespace?


PS11, Line 440: const string
Maybe cref?


Line 446:   InsertLoadgen bench(master_addrs, table_name,
With InsertLoadgen having no external callers, it's worth reevaluating various parts of the abstraction to see if they make sense. For example:
1. What's the point of the multi-arg constructor given that the only invocation passes gflags verbatim into it? Why not just use the gflags directly inside the class?
2. If the class can be made stateless, perhaps it should be converted into a series of free functions?


PS11, Line 466:     ostringstream s;
              :     s << "Encountered " << total_err_count << " errors";
              :     return Status::RuntimeError(s.str());
How about using strings::Substitute here instead?


PS11, Line 479:       ostringstream s;
              :       s << "Row count mismatch: expected " << total_row_count
              :         << "; actual " << count;
              :       return Status::RuntimeError(s.str());
Likewise here.


Line 494:       .Description(
Can you split this up using Description() and LongDescription()? See ksck for an example.


PS11, Line 497: higly
highly


Line 500:       .AddOptionalParameter("test_master_addresses")
Nit: please sort this list alphabetically. Makes it easier to find parameters.


-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] [benchmarks] introduced insert loadgen tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4412

to look at the new patch set (#3).

Change subject: [benchmarks] introduced insert_loadgen tool
......................................................................

[benchmarks] introduced insert_loadgen tool

A new simple load generation tool which allows to measure performance
of the Kudu C++ client library in MANUAL_FLUSH and AUTO_FLUSH_BACKGROUND
modes for 'push-as-much-as-you-can' scenarios: the client generates
and pushes as much data to tablet servers as it can given the
parameters for buffering/batching.

Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
---
M src/kudu/benchmarks/CMakeLists.txt
A src/kudu/benchmarks/insert_loadgen.cc
2 files changed, 414 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4412/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4412

to look at the new patch set (#16).

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................

[tools] added insert-generated-rows into kudu tools

The insert-generated-rows tool has been merged into the 'kudu'
umbrella toolset.

This addresses KUDU-1628.

Besides, added ability to run multiple inserter threads and
specify additional parameters on batching behavior of the generated
write operations.  It's possible to run data generating sessions
both in MANUAL_FLUSH and AUTO_FLUSH_BACKGROUND modes.  Introduced
sequential and random modes for the data generator.  Overall, these
changes allow to use the tool to measure performance of the Kudu C++
client library in simplistic 'push-as-much-as-you-can' scenario:
the client generates and sends data as fast as it can.

Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
---
M src/kudu/tools/CMakeLists.txt
D src/kudu/tools/insert-generated-rows.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action.cc
M src/kudu/tools/tool_action.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_fs.cc
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_master.cc
M src/kudu/tools/tool_action_pbc.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tools/tool_action_table.cc
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_action_tserver.cc
M src/kudu/tools/tool_action_wal.cc
M src/kudu/tools/tool_main.cc
17 files changed, 720 insertions(+), 179 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4412/16
-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................


Patch Set 15:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4412/15/src/kudu/tools/CMakeLists.txt
File src/kudu/tools/CMakeLists.txt:

PS15, Line 108: ADD_KUDU_TEST(kudu-test)
              : ADD_KUDU_TEST_DEPENDENCIES(kudu-test
              :   kudu)
> Why not reuse kudu-tool-test, which is where all of the new CLI tests (apar
Good idea, will do.  I was not sure where to put it; probably I should have asked for an advice.


http://gerrit.cloudera.org:8080/#/c/4412/15/src/kudu/tools/kudu-test.cc
File src/kudu/tools/kudu-test.cc:

Line 37: 
> These are nice, but can we at least verify that something was written with 
The exact row count is verified by the utility itself if adding '-run_scan' flag (this is so for all tests besides the very first one which runs with default parameters).

Do you mean we want to verify the exact data which has been written?  I.e., read it back and compare with the data which was inserted into the table?


http://gerrit.cloudera.org:8080/#/c/4412/15/src/kudu/tools/tool_action_test.cc
File src/kudu/tools/tool_action_test.cc:

Line 110: #include "kudu/util/random.h"
> Nit: should precee stopwatch (surprised Tidy Bot didn't mention it?)
Good catch, will update.

So, it seems there is a hope -- not all our jobs are going to disappear due to automation and AI :)

Probably, there is a bug in Tidy Bot.  I'm just speculating here, but it might be related to the fact that this header comes last in the list (it might be off-by-one mistake or alike).


PS15, Line 152: options allows "
              :             "to keep
> Nit: option retains
Done


PS15, Line 156: is not in effect
> Nit: has no effect
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 15
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................


Patch Set 20:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4412/20/src/kudu/tools/tool_action_test.cc
File src/kudu/tools/tool_action_test.cc:

PS20, Line 401: supposed
> Nit: "assumed", right?
Done


PS20, Line 405:   // It's necessary to have read-what-you-write behavior here.  Since
              :   // tablet leader can change and there might be replica propagation delays,
              :   // set the snapshot to the latest observed one to get correct row count.
> As JD discovered with ITClient, if you end up scanning from a replica that 
OK, thanks for the info.  I added retry logic for that.  It looks uglier now, but it should do the job.


-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 20
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged.

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................


[tools] added insert-generated-rows into kudu tools

The insert-generated-rows tool has been merged into the 'kudu'
umbrella toolset.

This addresses KUDU-1628.

Besides, added ability to run multiple inserter threads and
specify additional parameters on batching behavior of the generated
write operations.  It's possible to run data generating sessions
both in MANUAL_FLUSH and AUTO_FLUSH_BACKGROUND modes.  Introduced
sequential and random modes for the data generator.  Overall, these
changes allow to use the tool to measure performance of the Kudu C++
client library in simplistic 'push-as-much-as-you-can' scenario:
the client generates and sends data as fast as it can.

Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Reviewed-on: http://gerrit.cloudera.org:8080/4412
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Tested-by: Kudu Jenkins
---
M src/kudu/tools/CMakeLists.txt
D src/kudu/tools/insert-generated-rows.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action.cc
M src/kudu/tools/tool_action.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_fs.cc
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_master.cc
M src/kudu/tools/tool_action_pbc.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tools/tool_action_table.cc
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_action_tserver.cc
M src/kudu/tools/tool_action_wal.cc
M src/kudu/tools/tool_main.cc
17 files changed, 751 insertions(+), 183 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 23
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [benchmarks] introduced insert loadgen tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4412

to look at the new patch set (#2).

Change subject: [benchmarks] introduced insert_loadgen tool
......................................................................

[benchmarks] introduced insert_loadgen tool

A new simple load generation tool which allows to measure performance
of the Kudu C++ client library in MANUAL_FLUSH and AUTO_FLUSH_BACKGROUND
modes for 'push-as-much-as-you-can' scenarios: the client generates
and pushes as much data to tablet servers as it can given the
parameters for buffering/batching.

Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
---
M src/kudu/benchmarks/CMakeLists.txt
A src/kudu/benchmarks/insert_loadgen.cc
2 files changed, 413 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4412/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4412/11/src/kudu/tools/tool_action_test.cc
File src/kudu/tools/tool_action_test.cc:

Line 494:       .Description(
> You mean ExtraDescription()?  OK, will do.  I'm also thinking to add exampl
After some consideration I decided not to include the examples because it's not possible to preserve the formatting verbatim.  It requires some additional work to be done, which would constitute a separate changelist :)


-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4412

to look at the new patch set (#21).

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................

[tools] added insert-generated-rows into kudu tools

The insert-generated-rows tool has been merged into the 'kudu'
umbrella toolset.

This addresses KUDU-1628.

Besides, added ability to run multiple inserter threads and
specify additional parameters on batching behavior of the generated
write operations.  It's possible to run data generating sessions
both in MANUAL_FLUSH and AUTO_FLUSH_BACKGROUND modes.  Introduced
sequential and random modes for the data generator.  Overall, these
changes allow to use the tool to measure performance of the Kudu C++
client library in simplistic 'push-as-much-as-you-can' scenario:
the client generates and sends data as fast as it can.

Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
---
M src/kudu/tools/CMakeLists.txt
D src/kudu/tools/insert-generated-rows.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action.cc
M src/kudu/tools/tool_action.h
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_fs.cc
M src/kudu/tools/tool_action_local_replica.cc
M src/kudu/tools/tool_action_master.cc
M src/kudu/tools/tool_action_pbc.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tools/tool_action_table.cc
M src/kudu/tools/tool_action_tablet.cc
A src/kudu/tools/tool_action_test.cc
M src/kudu/tools/tool_action_tserver.cc
M src/kudu/tools/tool_action_wal.cc
M src/kudu/tools/tool_main.cc
17 files changed, 751 insertions(+), 183 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/4412/21
-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 21
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................


Patch Set 21:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4412/21/src/kudu/tools/tool_action_test.cc
File src/kudu/tools/tool_action_test.cc:

PS21, Line 413:  = Status::OK();
Nit: this is the default value, can omit.


-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 21
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] [tools] updated insert-generated-rows tool

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: [tools] updated insert-generated-rows tool
......................................................................


Patch Set 6:

(1 comment)

> (1 comment)
 > 
 > Would be nice to see a small integration test that runs the tool
 > against a cluster then verifies that it indeed inserted rows. Our
 > CLI tools have a tendency to bit-rot because of the lack of tests;
 > we're trying to change that with the new CLI tool.
 > 
 > On that note, what do you think of incorporating this as an action
 > in the new CLI tool? You'll get most of main() for free, but it
 > means this will ship to users. Do you think it'd a useful tool for
 > them?

Good point!  Sure -- I think it's a good idea to provide a means to check that the cluster was configured properly and is able to accept some data.  I will try to put up a small addition to the CLI tools, as you suggested.

http://gerrit.cloudera.org:8080/#/c/4412/6/src/kudu/tools/insert-generated-rows.cc
File src/kudu/tools/insert-generated-rows.cc:

Line 116: DEFINE_string(master_address, "localhost:7051",
> Nit: can we change this to master_addresses?
If we are not afraid of braking compatibility with the old version, then yes.

But Todd told me I'm free to break it! :)

Will change.


-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Dinesh Bhat (Code Review)" <ge...@cloudera.org>.
Dinesh Bhat has posted comments on this change.

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................


Patch Set 12:

(21 comments)

http://gerrit.cloudera.org:8080/#/c/4412/12/src/kudu/tools/tool_action_test.cc
File src/kudu/tools/tool_action_test.cc:

PS12, Line 21: options
s/options/option ?


Line 33: //     --num_threads=1 \
don't you need master_addresses here as required params ? also applicable in examples below i guess.


PS12, Line 75: master server(s) location
as per the code below there is no default_params for master server location right ? so this comment may be invalid.


Line 80: //     bench_c_00_03
Same as above. Also here and above, we could keep this name intuitive to reflect this as a table_name, say table_bench_xx or something.


Line 81: //
I liked the documentation idea of these options, good stuff.


Line 119: using kudu::Schema;
I think Schema is not being used anywhere.


Line 131: using kudu::client::sp::shared_ptr;
Actually a basic Qn here: why do we use this special shared_ptr  as opposed to std::shared_ptr ?


PS12, Line 142: 1000000
Nit: Not sure if you want to insert 1 million by default, if I am a curious user of this tool, I would like to just see few rows inserted unless you want to exercise the buffer limit also by pushing this number higher. If I am a serious user, I know what flag to set. My goal would be to reduce the amount of time taken by this test for a novice/curious user.


PS12, Line 147: inserter
nit: rename it to loadgen threads or load generator threads here and above ?


Line 194:   int64_t NextImpl() {
Do we want to use uint64_t here ? Although Random.Next64() seems to be filling in only lower 62 bits at the moment, it is better to take a safer route here.


PS12, Line 215: != 0
this check is prolly redundant ?


PS12, Line 285: case UNKNOWN_DATA:  // fall-through
redundant ?


Line 302:              FLAGS_buffer_flush_watermark_pct));
I vaguely recall cpp guideline saying next spill being after 4 spaces I guess, however ignore if I am wrong.


PS12, Line 314: FLAGS_num_rows_per_thread
consider using variable FLAGS_ directly to be consistent with other places.


PS12, Line 315: num_rows_per_thread == 0
given it's default value being 1M, what's this check for ?


Line 324:   if (row_count != nullptr) {
DCHECK or CHECK for this, unless we have a use case where it could actually be a nullptr ?


Line 403:   Stopwatch sw(Stopwatch::ALL_THREADS);
Stupid Qn: Does ALL_THREADS here mean all threads on CPU or all threads generated by this process ?  Also consider the alternative LOG_TIMING* here which hides all the Stopwatch details underneath, see rle.cc for an example usage.


PS12, Line 440: optional scan afterwards
I think this need not be made explicit since this is listed in optional param section anyways.


PS12, Line 452: kTableNameArg
Can we not make this table_name optional and create one if the user doesn't specify one ? That way this test can be self-sufficient. Otherwise this test requires the user to have create the table from another tool(or some other means).


PS12, Line 455:   
Nit: Double space here.


Line 457:           "the inserted rows." })
Ideally, we would not want this behavior of test leaving the test data on the server to keep it consistent with rest of the tests(although this is not a GTEST per se). I strongly recommend using either a call to 'kudu table delete' using SubProcess::Call or may be client->DeleteTable(table_name) since we have all the details here to delete a table unless we specify another optional param of keep_table or something alike.


-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] [tools] added insert-generated-rows into kudu tools

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: [tools] added insert-generated-rows into kudu tools
......................................................................


Patch Set 22: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I332927c0b928c9c4fb81a8e26f5c9ed7565299ad
Gerrit-PatchSet: 22
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <di...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No