You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org> on 2017/04/13 05:44:54 UTC

[kudu-CR] WIP: Expose a way to set "advanced" non-types scan options

David Ribeiro Alves has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6624

Change subject: WIP: Expose a way to set "advanced" non-types scan options
......................................................................

WIP: Expose a way to set "advanced" non-types scan options

This adds a new method to KuduScanner that allows to set
"advanced" scan options just by providing a pair of key
and value strings.

The main target for this feature is to allow Impala scanners
to specify that padding is required for UNIXTIME_MICROS columns
without "polluting" the api and in such a way that we can remove
support for this in the future.

This api could potentially be used to set "advanced" scanner hints
in the future.

WIP: because this needs a directed end-to-end test, even though
the wire-protocol itself has been tested in a previous patch.

Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
---
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scanner-internal.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tserver.proto
7 files changed, 68 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/6624/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>

[kudu-CR] WIP: Expose row format flags in KuduScanner

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: WIP: Expose row format flags in KuduScanner
......................................................................


Patch Set 6: Verified+1

Unrelated java failure

-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] WIP: Expose a way to set "advanced" non-types scan options

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has uploaded a new patch set (#2).

Change subject: WIP: Expose a way to set "advanced" non-types scan options
......................................................................

WIP: Expose a way to set "advanced" non-types scan options

This adds a new method to KuduScanner that allows to set
"advanced" scan options just by providing a pair of key
and value strings.

The main target for this feature is to allow Impala scanners
to specify that padding is required for UNIXTIME_MICROS columns
without "polluting" the api and in such a way that we can remove
support for this in the future.

This api could potentially be used to set "advanced" scanner hints
in the future.

WIP: because this needs a directed end-to-end test, even though
the wire-protocol itself has been tested in a previous patch.

Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
---
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scanner-internal.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tserver.proto
7 files changed, 68 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/6624/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] Expose row format flags in KuduScanner

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: Expose row format flags in KuduScanner
......................................................................


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6624/9/src/kudu/client/client.h
File src/kudu/client/client.h:

PS9, Line 2032: bit set
nit: I think 'bitset' is easier to understand


Line 2046:   /// format and the default will be used.
add a note that some flags may require server-side support, and thus the caller should be prepared to handle a NotSupported status in Open() and NextBatch()?


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] Expose row format flags in KuduScanner

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: Expose row format flags in KuduScanner
......................................................................


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6624/12/src/kudu/client/client.cc
File src/kudu/client/client.cc:

PS12, Line 1357:   if (PREDICT_FALSE(data_->configuration().row_format_flags() != KuduScanner::NO_FLAGS)) {
               :     return Status::IllegalState(
               :         Substitute("Cannot extract rows. Row format modifier flags were selected: $0",
               :                    data_->configuration().row_format_flags()));
               :   }
> If I'm not mistaken, to have that idempotent behavior Adar mentioned, it's 
makes sense. Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] Expose row format flags in KuduScanner

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: Expose row format flags in KuduScanner
......................................................................


Patch Set 8:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/6624/8/src/kudu/client/client.cc
File src/kudu/client/client.cc:

PS8, Line 1247: formal
> typo
Done


PS8, Line 1249:   switch (flags) {
              :     case NO_FLAGS:
              :     case PAD_UNIXTIME_MICROS_TO_16_BYTES:
              :       break;
              :     default:
              :       return Status::InvalidArgument(Substitute("Invalid row format flags: $0", flags));
              :   }
> nit: maybe, put this 'more static' check prior to the 'if (data_->open_)' o
Done


http://gerrit.cloudera.org:8080/#/c/6624/8/src/kudu/client/client.h
File src/kudu/client/client.h:

PS8, Line 2065: int64_t
> btw, 1UL << 32 would not fit into the RowFormatFlags enum.  It would be nec
ah, good catch. changed these to static const fields


http://gerrit.cloudera.org:8080/#/c/6624/8/src/kudu/client/scanner-internal.h
File src/kudu/client/scanner-internal.h:

PS8, Line 306: int64_t
> nit: if it's not changing during lifetime of the object, consider making ad
This is changed on Reset() so it can't be const


http://gerrit.cloudera.org:8080/#/c/6624/8/src/kudu/client/schema.h
File src/kudu/client/schema.h:

Line 218: 
> nit: was this line added intentionally?
Done


http://gerrit.cloudera.org:8080/#/c/6624/8/src/kudu/integration-tests/all_types-itest.cc
File src/kudu/integration-tests/all_types-itest.cc:

PS8, Line 98: > >
> Nit: with C++11 we no longer need to separate triangular brackets with spac
Done


http://gerrit.cloudera.org:8080/#/c/6624/8/src/kudu/tserver/scanners-test.cc
File src/kudu/tserver/scanners-test.cc:

PS8, Line 43: 0
> nit: why not to use the new shiny NO_FLAGS constant?
Done


http://gerrit.cloudera.org:8080/#/c/6624/8/src/kudu/tserver/scanners.h
File src/kudu/tserver/scanners.h:

PS8, Line 333: int64_t
> nit: since it is not changed during lifetime of the Scanner, maybe make it 
Done


http://gerrit.cloudera.org:8080/#/c/6624/8/src/kudu/tserver/tserver.proto
File src/kudu/tserver/tserver.proto:

PS8, Line 274: bitset in this int64
> If it's a bitset, then why not uint64?  Unsigned types look like a natural 
we've been generally trying to avoid unsigned ints, but since this is not a counter or a number I thing you're right. I've changed this to unsigned


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: Expose a way to set "advanced" non-types scan options

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: WIP: Expose a way to set "advanced" non-types scan options
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6624/4/src/kudu/client/scan_configuration.cc
File src/kudu/client/scan_configuration.cc:

PS4, Line 184:  if (strcasecmp(name.c_str(), kPadUnixTimeMicrosScanOption)) {
here and below, this isn't going to do what you want it to. strcasecmp returns 0 if they're equal


PS4, Line 186: pad_unixtime_micros_for_impala_
return OK after this and below, otherwise you'll return the error on l195


PS4, Line 187: true
false?

also compare is used wrong


http://gerrit.cloudera.org:8080/#/c/6624/4/src/kudu/tserver/tserver.proto
File src/kudu/tserver/tserver.proto:

Line 377:   // Whether the server supports padding UNIXTIME_MICROS slots.
something doesn't seem hooked up correctly, when I play with this I'm getting

Remote error: unsupported feature flags


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] Expose row format flags in KuduScanner

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: Expose row format flags in KuduScanner
......................................................................


Patch Set 10:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6624/9/src/kudu/client/scanner-internal.cc
File src/kudu/client/scanner-internal.cc:

PS9, Line 546: LOG(FATAL)
Maybe, return Status from this method instead of aborting here?  That would be cleaner, IMO: all other places where the method is used return Status themselves, so it's easy to update.

BTW, this method is also used by the kudu tool -- not crashing in such cases would give better impression for the end-users, I think.


http://gerrit.cloudera.org:8080/#/c/6624/10/src/kudu/tserver/scanners-test.cc
File src/kudu/tserver/scanners-test.cc:

PS10, Line 23: #include "kudu/tserver/tserver.pb.h"
             : #include "kudu/tserver/scanner_metrics.h"
nit: re-order these lines to be sorted properly.


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: Expose "raw" mode in KuduScanner and allow to pass flags

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: WIP: Expose "raw" mode in KuduScanner and allow to pass flags
......................................................................


Patch Set 5:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/client/client.h
File src/kudu/client/client.h:

Line 2018:   enum { kDefaultRawModeFlags = -1 };
> This is confusing. If the idea is to provide a constant that logically mean
yea, I'd expect the default to be 0


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/client/scan_configuration.cc
File src/kudu/client/scan_configuration.cc:

Line 182:   raw_mode_flags_ = flags;
we should probably check that there are no unsupported flags set


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/client/scan_configuration.h
File src/kudu/client/scan_configuration.h:

Line 139:     return raw_mode_flags_ != KuduScanner::kDefaultRawModeFlags;
strikes me as odd, since this is -1 (all flags set)


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/client/scanner-internal.cc
File src/kudu/client/scanner-internal.cc:

PS5, Line 536: raw_mode_flags_ != KuduScanner::kDefaultRawModeFlags 
yea, this defaultRawModeFlags thing is funny looking here - I'd expect the default "0" to mean not enabled, and maybe allocate a flag called 'RAW_MODE_V1" or something? That way if we ever changed our internal format, we could switch to setting RAW_MODE_V2 instead so the client can indicate its compatibility with the current version


Line 550:     LOG(FATAL) << "Cannot extract rows, \"raw\" mode";
would it be possible for a user in a release build to hit this case? Or would this really be indicative of a bug on our side?


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/tserver/scanners.h
File src/kudu/tserver/scanners.h:

Line 100:   int64_t raw_mode_flags() const { return raw_mode_flags_; }
> Why is this a ScannerManager thing? I thought it'd only be per scanner?
yea


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

Line 410:   void set_raw_mode_flags(int64_t raw_mode_flags) override {
> This is unintuitive; since collectors are constructed per RPC, I would have
would it be simpler if we pass the raw-mode flags on every ContinueScan RPC rather than storing it as part of the scanner object? Then we would be able to avoid the changes to the Scanner interface, as well as solve this problem


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/tserver/tserver.proto
File src/kudu/tserver/tserver.proto:

Line 270:   // "Raw" mode flags.
> I understand the desire for a "raw" mode client-side, so that users who sig
+1, it shouldn't distinguish raw vs not-raw. At this level all scans are kind of "raw"

Perhaps it should be called "format_flags" or something? As for a flag bitset vs separate booleans I dont have a strong opinion. I suppose separate booleans would be less dense in memory than a bitset, but doubt it matters.


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] Expose row format flags in KuduScanner

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6624

to look at the new patch set (#13).

Change subject: Expose row format flags in KuduScanner
......................................................................

Expose row format flags in KuduScanner

This adds a way to pass row format modifier flags to KuduScanner,
encoded in an int64_t as a bitset.

This API is marked as advanced and it's explicitely called
out that the user is responsible for knowing the row data format
that results from setting such flags and decoding the row data.

The only use for these flags, presently, is to set
PAD_UNIXTIME_MICROS_TO_16_BYTES, making sure that the server
pads slots for UNIXTIME_MICROS with an additional 8 bytes to
the left. In the future we might use these flags to (temporarily)
provide old/new row formats without having to change all the
row apis at once.

This adds a new test to all_types-itest that tests reading data
with the padding. The test scans a random projection to test
the padded column(s) in different positions.

Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
---
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/integration-tests/all_types-itest.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/scanners-test.cc
M src/kudu/tserver/scanners.cc
M src/kudu/tserver/scanners.h
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tserver.proto
13 files changed, 387 insertions(+), 69 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/6624/13
-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: Expose a way to set "advanced" non-types scan options

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: WIP: Expose a way to set "advanced" non-types scan options
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6624/4//COMMIT_MSG
Commit Message:

PS4, Line 15: without "polluting" the api and in such a way that we can remove
            : support for this in the future.
> I agree with the first portion of the soap box - I would prefer an API that
Thanks for weighing in.

I have nothing against the enum, personally, just seems limiting from an API perspective as it doesn't allow to set a value and thus reduces the usefulness of the api itself. for instance say that we would want in the future to enable users to set the size of the row block for scans, this would need a value.

If we're going the enum route maybe we should just have "pad_timestamp_16_bytes" getters and setters in KuduScanner and be done with it.

Regarding the second part of the soapbox, I agree with todd. As long as it's pretty clear what is the api contract, from an evolution perspective, then users can choose to take in the risk or not. Finally I would say that while we might not easily accept these kinds from any random contributor I would say we would definitely consider them seriously for any big Kudu consumer, be it Impala or Spark or any other well known compute/query layer.


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: Expose a way to set "advanced" non-types scan options

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: WIP: Expose a way to set "advanced" non-types scan options
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6624/4//COMMIT_MSG
Commit Message:

PS4, Line 15: without "polluting" the api and in such a way that we can remove
            : support for this in the future.
> Regarding this advanced option API, is an enum constant strong enough? I un
Likely a run-time error, as before. In any case I don't think we should the current structure (SetAdvancedScanOption) if we're only going to allow it to set boolean values through an enum. Much rather then having a proper method whose removal fails outright at both compile and link time.


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] Expose row format flags in KuduScanner

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6624

to look at the new patch set (#10).

Change subject: Expose row format flags in KuduScanner
......................................................................

Expose row format flags in KuduScanner

This adds a way to pass row format modifier flags to KuduScanner,
encoded in an int64_t as a bitset.

This API is marked as advanced and it's explicitely called
out that the user is responsible for knowing the row data format
that results from setting such flags and decoding the row data.

The only use for these flags, presently, is to set
PAD_UNIXTIME_MICROS_TO_16_BYTES, making sure that the server
pads slots for UNIXTIME_MICROS with an additional 8 bytes to
the left. In the future we might use these flags to (temporarily)
provide old/new row formats without having to change all the
row apis at once.

This adds a new test to all_types-itest that tests reading data
with the padding. The test scans a random projection to test
the padded column(s) in different positions.

Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
---
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/integration-tests/all_types-itest.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/scanners-test.cc
M src/kudu/tserver/scanners.cc
M src/kudu/tserver/scanners.h
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tserver.proto
13 files changed, 383 insertions(+), 69 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/6624/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: Expose a way to set "advanced" non-types scan options

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: WIP: Expose a way to set "advanced" non-types scan options
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6624/4//COMMIT_MSG
Commit Message:

PS4, Line 15: without "polluting" the api and in such a way that we can remove
            : support for this in the future.
> Thanks for weighing in.
Regarding this advanced option API, is an enum constant strong enough? I understand how it will surface an API removal as a breakage at compile time, but if an existing application binary is used, will the removal manifest as a link time error or a run time error?

Regarding API development, I had drafted a response comparing the Linux syscall API to Guava and Hadoop's APIs, but I don't think it'll be a productive discussion. It's clear to me that I would prefer if unstable APIs in Kudu were highly discouraged if not prohibited outright, and if Kudu's API culture were more egalitarian (i.e. consumer "weight" didn't play a role in dictating exceptions from the "no unstable APIs" rule). But, I appear to be in the minority, and this isn't the right forum for that discussion. So consider it dropped.


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: Expose a way to set "advanced" non-types scan options

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: WIP: Expose a way to set "advanced" non-types scan options
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6624/4//COMMIT_MSG
Commit Message:

PS4, Line 15: without "polluting" the api and in such a way that we can remove
            : support for this in the future.
> Yea, in this case (and in many similar cases) the flag is really just passe
So if I may try and restate your argument:
1. This option only makes sense if passed through directly to the server (I agree that implementing a compatibility layer doesn't make sense).
2. As such, an application making use of it must already contend with a "not supported" error should it contact an old server.
3. Thus, the presence of an additional "not supported" error when a future _client library_ doesn't support the option is not an imposition, because a well-behaved application will have handled it anyway as per #2.

I can buy that.


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: Expose a way to set "advanced" non-types scan options

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: WIP: Expose a way to set "advanced" non-types scan options
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6624/4/src/kudu/client/scan_configuration.cc
File src/kudu/client/scan_configuration.cc:

PS4, Line 184:  if (strcasecmp(name.c_str(), kPadUnixTimeMicrosScanOption)) {
> yeah, I noticed, just trying to get feedback on the API. doesn't seem like 
yeah, I figured, though I'd like to get coding even if the API changes

I'm working around these issues for now


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] Expose row format flags in KuduScanner

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: Expose row format flags in KuduScanner
......................................................................


Patch Set 6:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/6624/6/src/kudu/client/client.cc
File src/kudu/client/client.cc:

PS6, Line 1247: Raw mode
> "Row format flags"
Done


PS6, Line 1254: raw mode
> "row format"
Done


http://gerrit.cloudera.org:8080/#/c/6624/6/src/kudu/client/client.h
File src/kudu/client/client.h:

PS6, Line 2053: 0;
> Should this be NO_FLAGS?
Done


http://gerrit.cloudera.org:8080/#/c/6624/6/src/kudu/client/scanner-internal.cc
File src/kudu/client/scanner-internal.cc:

Line 545:   if (row_format_flags_ != KuduScanner::NO_FLAGS) {
> Why is this check a LOG(FATAL) one here but a DCHECK_EQ() one in KuduScanBa
This one is not on a hot path while the other one is. I'd prefer to have it fatal there too, but I explicitly called out that the client might crash _or_ return wrong results.


PS6, Line 546: Format modifier
> Row format
Done


http://gerrit.cloudera.org:8080/#/c/6624/6/src/kudu/client/scanner-internal.h
File src/kudu/client/scanner-internal.h:

PS6, Line 272: modifiers
> flags
Done


Line 304:   int64_t row_format_flags_;
> Nit: since every other data member here is documented, could you document t
Done


http://gerrit.cloudera.org:8080/#/c/6624/6/src/kudu/tserver/tserver.proto
File src/kudu/tserver/tserver.proto:

PS6, Line 274: IN
> Nit: in (or did you capitalize this for a reason?)
hum, no, weird.  Done


PS6, Line 275: 0
> Shouldn't this be RowFormatFlags::NO_FLAGS?
I had tried that. protoc doesn't let me assign an enum to an int64. Did point that this value corresponds to NO_FLAGS thohd


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] Expose row format flags in KuduScanner

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: Expose row format flags in KuduScanner
......................................................................


Patch Set 13: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 13
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] WIP: Expose row format flags in KuduScanner

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: WIP: Expose row format flags in KuduScanner
......................................................................


Patch Set 6:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/6624/6/src/kudu/client/client.cc
File src/kudu/client/client.cc:

PS6, Line 1247: Raw mode
"Row format flags"


PS6, Line 1254: raw mode
"row format"


http://gerrit.cloudera.org:8080/#/c/6624/6/src/kudu/client/client.h
File src/kudu/client/client.h:

PS6, Line 2053: 0;
Should this be NO_FLAGS?


http://gerrit.cloudera.org:8080/#/c/6624/6/src/kudu/client/scanner-internal.cc
File src/kudu/client/scanner-internal.cc:

Line 545:   if (row_format_flags_ != KuduScanner::NO_FLAGS) {
Why is this check a LOG(FATAL) one here but a DCHECK_EQ() one in KuduScanBatch::Data::row()?


PS6, Line 546: Format modifier
Row format


http://gerrit.cloudera.org:8080/#/c/6624/6/src/kudu/client/scanner-internal.h
File src/kudu/client/scanner-internal.h:

PS6, Line 272: modifiers
flags


Line 304:   int64_t row_format_flags_;
Nit: since every other data member here is documented, could you document this too?


http://gerrit.cloudera.org:8080/#/c/6624/6/src/kudu/tserver/tserver.proto
File src/kudu/tserver/tserver.proto:

PS6, Line 274: IN
Nit: in (or did you capitalize this for a reason?)


PS6, Line 275: 0
Shouldn't this be RowFormatFlags::NO_FLAGS?


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] Expose row format flags in KuduScanner

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has uploaded a new patch set (#8).

Change subject: Expose row format flags in KuduScanner
......................................................................

Expose row format flags in KuduScanner

This adds a way to pass row format modifier flags to KuduScanner,
encoded in an int64_t as a bitset.

This API is marked as advanced and it's explicitely called
out that the user is responsible for knowing the row data format
that results from setting such flags and decoding the row data.

The only use for these flags, presently, is to set
PAD_UNIXTIME_MICROS_TO_16_BYTES, making sure that the server
pads slots for UNIXTIME_MICROS with an additional 8 bytes to
the left. In the future we might use these flags to (temporarily)
provide old/new row formats without having to change all the
row apis at once.

This adds a new test to all_types-itest that tests reading data
with the padding. The test scans a random projection to test
the padded column(s) in different positions.

Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
---
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/client/schema.h
M src/kudu/integration-tests/all_types-itest.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/scanners-test.cc
M src/kudu/tserver/scanners.cc
M src/kudu/tserver/scanners.h
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tserver.proto
14 files changed, 382 insertions(+), 69 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/6624/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: Expose a way to set "advanced" non-types scan options

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: WIP: Expose a way to set "advanced" non-types scan options
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6624/4//COMMIT_MSG
Commit Message:

PS4, Line 15: without "polluting" the api and in such a way that we can remove
            : support for this in the future.
> I am opposed to introducing an API that's weakly structured in this way.
I agree with the first portion of the soap box - I would prefer an API that uses named enum constants, like:

SetAdvancedOption(PAD_TIMESTAMP_16_BYTES);

so that we get the benefits of type safety, documentation, etc.


On the second part of the soapbox, I don't think we should constrain ourselves that all APIs need to have an equal stability/compatibility guarantee. As I think we all agree, maintaining a stable API is costly since it involves testing, generates back-compat code concerns, etc. That's why we always try to make the APIs as narrow as possible, think through them up front, etc.

That said, I think there's a case to be made that there are some advanced APIs where we can save a lot of cost (docs, compat, etc) by signing up for a lighter compatibility contract. It does force consumers to make the choice to tighten the compatibility matrix (eg Impala 2.9 would only support Kudu 1.4 or whatever), but so long as this restriction is advertised up front, I think that's OK. The consumer can then weigh the advantages of using the API (in this case, performance) against the disadvantages (having to tightly couple versions).

So the remaining question is what the decision framework is for whether an API should be stable/documented/etc vs unstable/etc. I think for me it comes down to how many consumers we expect to have, and how advanced they are. In the case of a query engine like Impala building an integration with Kudu, we already expect them to be closely following Kudu development (e.g every release we add new predicate types which Impala adds pushdown for). And we expect them to be very advanced consumers (they already rely on the byte format of our cells to get good performance). So adding one more advanced API that they need to dig into the code a bit to use isn't problematic in my view.

Meanwhile, users who are more concerned with ease of use and version flexibility (probably the vast majority of users) can stick with our nice documented stable API. Even if we documented and provided guarantees for this new one, I doubt most users would want to use it.

Thoughts?


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] Expose row format flags in KuduScanner

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: Expose row format flags in KuduScanner
......................................................................


Patch Set 8:

(8 comments)

Just skimmed over the code to get more context on the new feature.

http://gerrit.cloudera.org:8080/#/c/6624/8/src/kudu/client/client.cc
File src/kudu/client/client.cc:

PS8, Line 1247: formal
typo


PS8, Line 1249:   switch (flags) {
              :     case NO_FLAGS:
              :     case PAD_UNIXTIME_MICROS_TO_16_BYTES:
              :       break;
              :     default:
              :       return Status::InvalidArgument(Substitute("Invalid row format flags: $0", flags));
              :   }
nit: maybe, put this 'more static' check prior to the 'if (data_->open_)' one?


http://gerrit.cloudera.org:8080/#/c/6624/8/src/kudu/client/client.h
File src/kudu/client/client.h:

PS8, Line 2065: int64_t
btw, 1UL << 32 would not fit into the RowFormatFlags enum.  It would be necessary to switch to C++11 enums style like 'enum X : uint64_t' to accommodate those values or use some compiler flags, which might break ABI compatibility.  Why to have this parameter of 8 byte type then?


http://gerrit.cloudera.org:8080/#/c/6624/8/src/kudu/client/scanner-internal.h
File src/kudu/client/scanner-internal.h:

PS8, Line 306: int64_t
nit: if it's not changing during lifetime of the object, consider making adding 'const' modifier.


http://gerrit.cloudera.org:8080/#/c/6624/8/src/kudu/client/schema.h
File src/kudu/client/schema.h:

Line 218: 
nit: was this line added intentionally?


http://gerrit.cloudera.org:8080/#/c/6624/8/src/kudu/tserver/scanners-test.cc
File src/kudu/tserver/scanners-test.cc:

PS8, Line 43: 0
nit: why not to use the new shiny NO_FLAGS constant?


http://gerrit.cloudera.org:8080/#/c/6624/8/src/kudu/tserver/scanners.h
File src/kudu/tserver/scanners.h:

PS8, Line 333: int64_t
nit: since it is not changed during lifetime of the Scanner, maybe make it constant?


http://gerrit.cloudera.org:8080/#/c/6624/8/src/kudu/tserver/tserver.proto
File src/kudu/tserver/tserver.proto:

PS8, Line 274: bitset in this int64
If it's a bitset, then why not uint64?  Unsigned types look like a natural choice for bitset, especially if using the highest bit as well.  It also helps if using std::bitset in the C++ code.


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] Expose row format flags in KuduScanner

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: Expose row format flags in KuduScanner
......................................................................


Patch Set 10:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6624/9/src/kudu/client/scanner-internal.cc
File src/kudu/client/scanner-internal.cc:

PS9, Line 546: LOG(FATAL)
> Maybe, return Status from this method instead of aborting here?  That would
This is undue use of the API as I mention in the comments in KuduScanner in client.h. I'd rather have a FATAL that makes the problem obvious instead of a status that can be ignored since there's nothing the user can do to recover this row data.


http://gerrit.cloudera.org:8080/#/c/6624/10/src/kudu/tserver/scanners-test.cc
File src/kudu/tserver/scanners-test.cc:

PS10, Line 23: #include "kudu/tserver/tserver.pb.h"
             : #include "kudu/tserver/scanner_metrics.h"
> nit: re-order these lines to be sorted properly.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: Expose "raw" mode in KuduScanner and allow to pass flags

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: WIP: Expose "raw" mode in KuduScanner and allow to pass flags
......................................................................


Patch Set 5:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/6624/5//COMMIT_MSG
Commit Message:

PS5, Line 10: should only
> Is this a stylistic choice? Or is it by necessity? Meaning, is it impossibl
I changed this according to feedback.


PS5, Line 10: retreived
> retrieved
Done


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/client/client.h
File src/kudu/client/client.h:

Line 2018:   enum { kDefaultRawModeFlags = -1 };
> This is confusing. If the idea is to provide a constant that logically mean
k, changed this to 0, changed EnableRawModeWithFlags to SetRawModeFlags


Line 2018:   enum { kDefaultRawModeFlags = -1 };
> yea, I'd expect the default to be 0
Done


Line 2028:     PAD_UNIXTIME_MICROS_TO_16_BYTES = 1
> The idea is for these flags not to be mutually exclusive, right? If so, the
yeah, it's just that at this point we only have one. I'll use that notation to make it clear.


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/client/scan_configuration.cc
File src/kudu/client/scan_configuration.cc:

Line 31: using strings::Substitute;
> Not used?
Done


Line 182:   raw_mode_flags_ = flags;
> we should probably check that there are no unsupported flags set
Done


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/client/scan_configuration.h
File src/kudu/client/scan_configuration.h:

Line 139:     return raw_mode_flags_ != KuduScanner::kDefaultRawModeFlags;
> strikes me as odd, since this is -1 (all flags set)
yeah, was kind of thinking of using the first (sign) bit to distinguish between raw mode and non-raw mode. changed this to 0 anyway.


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/client/scanner-internal.cc
File src/kudu/client/scanner-internal.cc:

Line 245:       (configuration().raw_mode_flags() & KuduScanner::PAD_UNIXTIME_MICROS_TO_16_BYTES) ==
> == shouldn't be necessary.
Done


PS5, Line 536: raw_mode_flags_ != KuduScanner::kDefaultRawModeFlags 
> yea, this defaultRawModeFlags thing is funny looking here - I'd expect the 
I've changed this to NO_FLAGS. I don't think we need to go into format versioning since it's not really required for this patch or presently. If we ever add it in the future we can add a flag for the old format (i.e. RAW_MODE_V1 as you suggest) and have NO_FLAGS mean the unaltered "client supported" format. Maybe it would make sense then to also have ROW_FORMAT_V2 so that you could padd other format modifiers with both versions, but again I don't think we need to handle that right now. and the current impl doesn't preclude adding that later.


Line 550:     LOG(FATAL) << "Cannot extract rows, \"raw\" mode";
> would it be possible for a user in a release build to hit this case? Or wou
Users can hit this if they set a modifier flag but then use KuduScanner::NextBatch(vector<KuduRowResult>* rows) to decode the rows.


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/client/scanner-internal.h
File src/kudu/client/scanner-internal.h:

Line 256:   Status Reset(rpc::RpcController* controller,
> warning: function 'kudu::client::KuduScanBatch::Data::Reset' has a definiti
Done


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/tserver/scanners.h
File src/kudu/tserver/scanners.h:

Line 100:   int64_t raw_mode_flags() const { return raw_mode_flags_; }
> yea
oops. Done


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

Line 342:   virtual void set_raw_mode_flags(int64_t raw_mode_flags) {}
> If ScanResultCollector is to be a generic "interface" as its class comment 
I was torn about this. I pondered the alternative you suggest, but I don't like that I'd have to have either checksum collector either implement this as a no op or have to make this return a status. Both options seem worse than having an empty impl here, IMO


Line 410:   void set_raw_mode_flags(int64_t raw_mode_flags) override {
> This is unintuitive; since collectors are constructed per RPC, I would have
Done


Line 410:   void set_raw_mode_flags(int64_t raw_mode_flags) override {
> would it be simpler if we pass the raw-mode flags on every ContinueScan RPC
In order to do that we'd have to move the field from NewScanRequestPB to ScanRequestPB which I don't think makes much sense. Particularly because, as we add flags, we'd have to perform all the necessary validation once per continue scan request versus once per new scan.


Line 411:     if (raw_mode_flags == -1) return;
> This special case won't be necessary if the default value is 0 rather than 
Done


Line 413:         == RawModeFlags::PAD_UNIX_TIME_MICROS_TO_16_BYTES) {
> Why is the == necessary? If you mask one bit flag from raw_mode_flags, the 
Done


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/tserver/tserver.proto
File src/kudu/tserver/tserver.proto:

Line 270:   // "Raw" mode flags.
> +1, it shouldn't distinguish raw vs not-raw. At this level all scans are ki
The distinction between raw and non-raw is gone.

I feel like we went back and forth regarding this enough and progress needs to be made.

When we first discussed the bitset solution it was mentioned that it could be useful if in the future we wanted to add new flags for things like row format versions without changing the protocol, which this solution supports. None of this is ideal honestly, but I'm personally not a fan of adding a TimestampPaddingMode either, it feels too "first class" for what this feature currently is.


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: Expose "raw" mode in KuduScanner and allow to pass flags

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6624

to look at the new patch set (#5).

Change subject: WIP: Expose "raw" mode in KuduScanner and allow to pass flags
......................................................................

WIP: Expose "raw" mode in KuduScanner and allow to pass flags

This adds a way to set a KuduScanner to "raw" mode. In this mode
the data should only be retreived through the direct_data() and
indirect_data() in KuduScanBatch.

It also adds a way to pass "flags" encoded as bits in an int64_t
'raw_mode_flags_' var.

The only use for these flags, presently, is to set
PAD_UNIXTIME_MICROS_TO_16_BYTES, making sure that the server
pads slots for UNIXTIME_MICROS with an additional 8 bytes to
the left.

WIP: Working on a directed test, just making sure I didn't
break anything with the piping.

Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
---
M src/kudu/client/client-test.cc
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/scanners-test.cc
M src/kudu/tserver/scanners.cc
M src/kudu/tserver/scanners.h
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tserver.proto
13 files changed, 160 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/6624/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: Expose a way to set "advanced" non-types scan options

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6624

to look at the new patch set (#3).

Change subject: WIP: Expose a way to set "advanced" non-types scan options
......................................................................

WIP: Expose a way to set "advanced" non-types scan options

This adds a new method to KuduScanner that allows to set
"advanced" scan options just by providing a pair of key
and value strings.

The main target for this feature is to allow Impala scanners
to specify that padding is required for UNIXTIME_MICROS columns
without "polluting" the api and in such a way that we can remove
support for this in the future.

This api could potentially be used to set "advanced" scanner hints
in the future.

WIP: because this needs a directed end-to-end test, even though
the wire-protocol itself has been tested in a previous patch.

Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
---
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scanner-internal.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tserver.proto
7 files changed, 68 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/6624/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] Expose row format flags in KuduScanner

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6624

to look at the new patch set (#12).

Change subject: Expose row format flags in KuduScanner
......................................................................

Expose row format flags in KuduScanner

This adds a way to pass row format modifier flags to KuduScanner,
encoded in an int64_t as a bitset.

This API is marked as advanced and it's explicitely called
out that the user is responsible for knowing the row data format
that results from setting such flags and decoding the row data.

The only use for these flags, presently, is to set
PAD_UNIXTIME_MICROS_TO_16_BYTES, making sure that the server
pads slots for UNIXTIME_MICROS with an additional 8 bytes to
the left. In the future we might use these flags to (temporarily)
provide old/new row formats without having to change all the
row apis at once.

This adds a new test to all_types-itest that tests reading data
with the padding. The test scans a random projection to test
the padded column(s) in different positions.

Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
---
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/integration-tests/all_types-itest.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/scanners-test.cc
M src/kudu/tserver/scanners.cc
M src/kudu/tserver/scanners.h
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tserver.proto
13 files changed, 387 insertions(+), 69 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/6624/12
-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] Expose row format flags in KuduScanner

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6624

to look at the new patch set (#11).

Change subject: Expose row format flags in KuduScanner
......................................................................

Expose row format flags in KuduScanner

This adds a way to pass row format modifier flags to KuduScanner,
encoded in an int64_t as a bitset.

This API is marked as advanced and it's explicitely called
out that the user is responsible for knowing the row data format
that results from setting such flags and decoding the row data.

The only use for these flags, presently, is to set
PAD_UNIXTIME_MICROS_TO_16_BYTES, making sure that the server
pads slots for UNIXTIME_MICROS with an additional 8 bytes to
the left. In the future we might use these flags to (temporarily)
provide old/new row formats without having to change all the
row apis at once.

This adds a new test to all_types-itest that tests reading data
with the padding. The test scans a random projection to test
the padded column(s) in different positions.

Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
---
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/integration-tests/all_types-itest.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/scanners-test.cc
M src/kudu/tserver/scanners.cc
M src/kudu/tserver/scanners.h
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tserver.proto
13 files changed, 383 insertions(+), 69 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/6624/11
-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] Expose row format flags in KuduScanner

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: Expose row format flags in KuduScanner
......................................................................


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6624/12/src/kudu/client/client.cc
File src/kudu/client/client.cc:

PS12, Line 1357:   if (PREDICT_FALSE(data_->configuration().row_format_flags() != KuduScanner::NO_FLAGS)) {
               :     return Status::IllegalState(
               :         Substitute("Cannot extract rows. Row format modifier flags were selected: $0",
               :                    data_->configuration().row_format_flags()));
               :   }
If I'm not mistaken, to have that idempotent behavior Adar mentioned, it's necessary to place this before NextBatch() call.


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] Expose row format flags in KuduScanner

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has submitted this change and it was merged.

Change subject: Expose row format flags in KuduScanner
......................................................................


Expose row format flags in KuduScanner

This adds a way to pass row format modifier flags to KuduScanner,
encoded in an int64_t as a bitset.

This API is marked as advanced and it's explicitely called
out that the user is responsible for knowing the row data format
that results from setting such flags and decoding the row data.

The only use for these flags, presently, is to set
PAD_UNIXTIME_MICROS_TO_16_BYTES, making sure that the server
pads slots for UNIXTIME_MICROS with an additional 8 bytes to
the left. In the future we might use these flags to (temporarily)
provide old/new row formats without having to change all the
row apis at once.

This adds a new test to all_types-itest that tests reading data
with the padding. The test scans a random projection to test
the padded column(s) in different positions.

Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Reviewed-on: http://gerrit.cloudera.org:8080/6624
Reviewed-by: Alexey Serbin <as...@cloudera.com>
Tested-by: Kudu Jenkins
---
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/integration-tests/all_types-itest.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/scanners-test.cc
M src/kudu/tserver/scanners.cc
M src/kudu/tserver/scanners.h
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tserver.proto
13 files changed, 387 insertions(+), 69 deletions(-)

Approvals:
  Alexey Serbin: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] Expose row format flags in KuduScanner

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: Expose row format flags in KuduScanner
......................................................................


Patch Set 8: Code-Review+1

(1 comment)

Will defer to Alexey or Todd for the final review.

http://gerrit.cloudera.org:8080/#/c/6624/8/src/kudu/integration-tests/all_types-itest.cc
File src/kudu/integration-tests/all_types-itest.cc:

PS8, Line 98: > >
Nit: with C++11 we no longer need to separate triangular brackets with spaces.


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] Expose row format flags in KuduScanner

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has uploaded a new patch set (#7).

Change subject: Expose row format flags in KuduScanner
......................................................................

Expose row format flags in KuduScanner

This adds a way to pass row format modifier flags to KuduScanner,
encoded in an int64_t as a bitset.

This API is marked as advanced and it's explicitely called
out that the user is responsible for knowing the row data format
that results from setting such flags and decoding the row data.

The only use for these flags, presently, is to set
PAD_UNIXTIME_MICROS_TO_16_BYTES, making sure that the server
pads slots for UNIXTIME_MICROS with an additional 8 bytes to
the left. In the future we might use these flags to (temporarily)
provide old/new row formats without having to change all the
row apis at once.

This adds a new test to all_types-itest that tests reading data
with the padding. The test scans a random projection to test
the padded column(s) in different positions.

Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
---
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/client/schema.h
M src/kudu/integration-tests/all_types-itest.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/scanners-test.cc
M src/kudu/tserver/scanners.cc
M src/kudu/tserver/scanners.h
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tserver.proto
14 files changed, 382 insertions(+), 69 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/6624/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: Expose "raw" mode in KuduScanner and allow to pass flags

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: WIP: Expose "raw" mode in KuduScanner and allow to pass flags
......................................................................


Patch Set 5:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/6624/5//COMMIT_MSG
Commit Message:

PS5, Line 10: retreived
retrieved


PS5, Line 10: should only
Is this a stylistic choice? Or is it by necessity? Meaning, is it impossible to safely retrieve data from a "normal" scanner via direct_data()/indirect_data()?


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/client/client.h
File src/kudu/client/client.h:

Line 2018:   enum { kDefaultRawModeFlags = -1 };
This is confusing. If the idea is to provide a constant that logically means "raw mode on but no flags", it should have a value 0 and be part of the RawModeFlags enum. But if it means "no raw mode at all"...isn't that the state of the world if you don't call EnableRawModeWithFlags()?

Or is this some internal thing? In which case, why is it here at all?


Line 2028:     PAD_UNIXTIME_MICROS_TO_16_BYTES = 1
The idea is for these flags not to be mutually exclusive, right? If so, the value of the first one should be "1 << 0" so it's clear that the next ones should be "1 << 1", "1 << 2", etc. That's how they can be bitflags and OR'ed together.


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/client/scan_configuration.cc
File src/kudu/client/scan_configuration.cc:

Line 31: using strings::Substitute;
Not used?


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/client/scanner-internal.cc
File src/kudu/client/scanner-internal.cc:

Line 245:       (configuration().raw_mode_flags() & KuduScanner::PAD_UNIXTIME_MICROS_TO_16_BYTES) ==
== shouldn't be necessary.


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/tserver/scanners.h
File src/kudu/tserver/scanners.h:

Line 100:   int64_t raw_mode_flags() const { return raw_mode_flags_; }
Why is this a ScannerManager thing? I thought it'd only be per scanner?


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

Line 342:   virtual void set_raw_mode_flags(int64_t raw_mode_flags) {}
> warning: parameter 'raw_mode_flags' is unused [misc-unused-parameters]
If ScanResultCollector is to be a generic "interface" as its class comment suggests, it shouldn't provide this default no-op implementation.


Line 410:   void set_raw_mode_flags(int64_t raw_mode_flags) override {
This is unintuitive; since collectors are constructed per RPC, I would have expected the right value for pad_unixtime_micros_to_16_bytes_ to be provided directly to the constructor. But I think I understand why: the collector is stack-allocated before fetching the Scanner, and reversing the order would require heap-allocating the collector.

Maybe you can try to explain this chicken-and-egg problem somewhere in the collector comments so it's clear for others?


Line 411:     if (raw_mode_flags == -1) return;
This special case won't be necessary if the default value is 0 rather than -1.


Line 413:         == RawModeFlags::PAD_UNIX_TIME_MICROS_TO_16_BYTES) {
Why is the == necessary? If you mask one bit flag from raw_mode_flags, the result will be either non-zero (requested) or zero (not requested).


http://gerrit.cloudera.org:8080/#/c/6624/5/src/kudu/tserver/tserver.proto
File src/kudu/tserver/tserver.proto:

Line 270:   // "Raw" mode flags.
I understand the desire for a "raw" mode client-side, so that users who sign up for timestamp padding are forced to also call  direct_data()/indirect_data() instead of row-by-row access.

But why is it exposed to the server? AFAICT, there's no server-side effect. The only thing that matters is the timestamp padding, and that can be exposed via something like TimestampPaddingMode (similar to ReadMode or OrderMode).


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: Expose a way to set "advanced" non-types scan options

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: WIP: Expose a way to set "advanced" non-types scan options
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6624/4//COMMIT_MSG
Commit Message:

PS4, Line 15: without "polluting" the api and in such a way that we can remove
            : support for this in the future.
> So if I may try and restate your argument:
Yep, good summary.

I would guess that many clients will be "badly behaved" if they can mandate the server version (eg Impala may decide it's better for them to just require Kudu 1.4 rather than support back-compat, after weighing the cost of maintaining compat). I'm happy to leave that choice up to consumers.


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] Expose row format flags in KuduScanner

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: Expose row format flags in KuduScanner
......................................................................


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6624/9/src/kudu/client/client.h
File src/kudu/client/client.h:

PS9, Line 2032: bit set
> nit: I think 'bitset' is easier to understand
Done


Line 2046:   /// format and the default will be used.
> add a note that some flags may require server-side support, and thus the ca
added this verbatim.


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] Expose row format flags in KuduScanner

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: Expose row format flags in KuduScanner
......................................................................


Patch Set 7: Verified+1

unrelated failure

-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] Expose row format flags in KuduScanner

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: Expose row format flags in KuduScanner
......................................................................


Patch Set 7:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6624/7/src/kudu/integration-tests/all_types-itest.cc
File src/kudu/integration-tests/all_types-itest.cc:

Line 410:   Status VerifyRows(ScannerSetup scanner_setup, RowVerifier verifier) {
> warning: the parameter 'scanner_setup' is copied for each invocation but on
Done


Line 410:   Status VerifyRows(ScannerSetup scanner_setup, RowVerifier verifier) {
> warning: the parameter 'verifier' is copied for each invocation but only us
Done


Line 450:   void RunTest(ScannerSetup scanner_setup, RowVerifier verifier) {
> warning: the parameter 'scanner_setup' is copied for each invocation but on
Done


Line 450:   void RunTest(ScannerSetup scanner_setup, RowVerifier verifier) {
> warning: the parameter 'verifier' is copied for each invocation but only us
Done


http://gerrit.cloudera.org:8080/#/c/6624/7/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

Line 348:   virtual void set_row_format_flags(int64_t row_format_flags) {}
> warning: parameter 'row_format_flags' is unused [misc-unused-parameters]
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: Expose a way to set "advanced" non-types scan options

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: WIP: Expose a way to set "advanced" non-types scan options
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6624/4/src/kudu/client/scan_configuration.cc
File src/kudu/client/scan_configuration.cc:

PS4, Line 184:  if (strcasecmp(name.c_str(), kPadUnixTimeMicrosScanOption)) {
> here and below, this isn't going to do what you want it to. strcasecmp retu
yeah, I noticed, just trying to get feedback on the API. doesn't seem like we're going this route anyway


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: Expose a way to set "advanced" non-types scan options

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: WIP: Expose a way to set "advanced" non-types scan options
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6624/4//COMMIT_MSG
Commit Message:

PS4, Line 15: without "polluting" the api and in such a way that we can remove
            : support for this in the future.
I am opposed to introducing an API that's weakly structured in this way.

Generally speaking, string-based key/value interfaces like this are attractive to library developers because the perceived cost of adding or removing an API is low. In reality, these APIs are actually more hostile to application developers than their strongly typed equivalents, for several reasons:
1. It's generally harder to figure out what the legal combinations of keys and values are. That is, it's harder to sift through documentation and/or string constants than through a set of well defined and strongly typed functions.
2. If an application tries to use an API that has been removed, the error will surface at run time instead of link (or compile) time.

I don't think removing a strongly typed API is any easier or harder than an API like this one: in either case, there must be a well-defined and well-publicized period of deprecation followed by the removal of the API.

Now, since I appear to be standing on a soap box, I feel compelled to continue my ranting. I appreciate that Impala is one of a small set of major applications that use Kudu, and as such, I also appreciate that many of our new APIs are driven Impala's use cases. But, I'm getting tired of seeing APIs defined in a sorta-hidden or you-need-to-know-the-undocumented-format kind of way. We wouldn't tolerate short cuts like these if they were proposed on behalf of github.com/someguy/mycooldataapp; we shouldn't tolerate them for a mature application like Impala either.

Moreover, we can't assume that users will always deploy Impala and Kudu in some predefined lockstep configuration; these are both Apache projects and users will try whatever combinations they see fit. As such, when we add APIs for Impala, we should intend for them to last forever like any of our APIs, and we should put enough forethought into them to stand by that commitment.


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: Expose row format flags in KuduScanner

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6624

to look at the new patch set (#6).

Change subject: WIP: Expose row format flags in KuduScanner
......................................................................

WIP: Expose row format flags in KuduScanner

This adds a way to pass row format modifier flags to KuduScanner,
encoded in an int64_t as a bitset.

This API is marked as advanced and it's explicitely called
out that the user is responsible for knowing the row data format
that results from setting such flags and decoding the row data.

The only use for these flags, presently, is to set
PAD_UNIXTIME_MICROS_TO_16_BYTES, making sure that the server
pads slots for UNIXTIME_MICROS with an additional 8 bytes to
the left. In the future we might use these flags to (temporarily)
provide old/new row formats without having to change all the
row apis at once.

WIP: Addressed all the comments but still working on on the
test.

Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
---
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/scanners-test.cc
M src/kudu/tserver/scanners.cc
M src/kudu/tserver/scanners.h
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tserver.proto
12 files changed, 173 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/6624/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] WIP: Expose a way to set "advanced" non-types scan options

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: WIP: Expose a way to set "advanced" non-types scan options
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6624/4/src/kudu/tserver/tserver.proto
File src/kudu/tserver/tserver.proto:

Line 377:   // Whether the server supports padding UNIXTIME_MICROS slots.
> something doesn't seem hooked up correctly, when I play with this I'm getti
Yeah this isn't hooked up in TabletServiceImpl yet


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] WIP: Expose a way to set "advanced" non-types scan options

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6624

to look at the new patch set (#4).

Change subject: WIP: Expose a way to set "advanced" non-types scan options
......................................................................

WIP: Expose a way to set "advanced" non-types scan options

This adds a new method to KuduScanner that allows to set
"advanced" scan options just by providing a pair of key
and value strings.

The main target for this feature is to allow Impala scanners
to specify that padding is required for UNIXTIME_MICROS columns
without "polluting" the api and in such a way that we can remove
support for this in the future.

This api could potentially be used to set "advanced" scanner hints
in the future.

WIP: because this needs a directed end-to-end test, even though
the wire-protocol itself has been tested in a previous patch.

Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
---
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scanner-internal.cc
M src/kudu/tserver/scanners.cc
M src/kudu/tserver/scanners.h
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tserver.proto
9 files changed, 101 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/6624/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] Expose row format flags in KuduScanner

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
Hello Adar Dembo,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6624

to look at the new patch set (#9).

Change subject: Expose row format flags in KuduScanner
......................................................................

Expose row format flags in KuduScanner

This adds a way to pass row format modifier flags to KuduScanner,
encoded in an int64_t as a bitset.

This API is marked as advanced and it's explicitely called
out that the user is responsible for knowing the row data format
that results from setting such flags and decoding the row data.

The only use for these flags, presently, is to set
PAD_UNIXTIME_MICROS_TO_16_BYTES, making sure that the server
pads slots for UNIXTIME_MICROS with an additional 8 bytes to
the left. In the future we might use these flags to (temporarily)
provide old/new row formats without having to change all the
row apis at once.

This adds a new test to all_types-itest that tests reading data
with the padding. The test scans a random projection to test
the padded column(s) in different positions.

Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
---
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/scan_configuration.cc
M src/kudu/client/scan_configuration.h
M src/kudu/client/scanner-internal.cc
M src/kudu/client/scanner-internal.h
M src/kudu/integration-tests/all_types-itest.cc
M src/kudu/tools/tool_action_remote_replica.cc
M src/kudu/tserver/scanners-test.cc
M src/kudu/tserver/scanners.cc
M src/kudu/tserver/scanners.h
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tserver.proto
13 files changed, 380 insertions(+), 69 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/6624/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] Expose row format flags in KuduScanner

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: Expose row format flags in KuduScanner
......................................................................


Patch Set 8: Verified+1

unrelated ITClient test failure

-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] WIP: Expose a way to set "advanced" non-types scan options

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: WIP: Expose a way to set "advanced" non-types scan options
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6624/4//COMMIT_MSG
Commit Message:

PS4, Line 15: without "polluting" the api and in such a way that we can remove
            : support for this in the future.
> Likely a run-time error, as before. In any case I don't think we should the
Yea, in this case (and in many similar cases) the flag is really just passed down to the server, in which case it's perfectly likely that the server is allowed to reject it even if the client library has accepted it. So, the client can produce runtime "not-supported" errors, even if the consumer of the API linked against an appropriate client version.

Certainly the client could _in theory_ do a compatibility layer by which it notices that the server doesn't support the flag, and instead does a full memcpy/re-padding of the scanner data, so that it implements the requested format. However that would be at a very high perf cost, and I think it would probably be preferable to just tell the consumer that the format is not supported, rather than have a relatively-silent perf issue that might never be discovered.

If you want to take the Linux analogy, I'd say this is similar to ioctl() or setsockopt() or other various "more flexible" APIs where the feature may not be supported in a particular runtime environment (filesystem, network socket type, etc)

Another analogy to other software: the JDK has plenty of "undocumented" APIs which can be useful to advanced users who are willing to make the tradeoff that they might go away at some point. sun.misc.Unsafe is the most commonly-used example, but there are plenty of other cases where I've seen people downcast in order to do certain tricky things that are outside the scope of what the platform provider wants to generally support.


-- 
To view, visit http://gerrit.cloudera.org:8080/6624
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I043b6514dc5fc307fc9c94eb41f3ae79796ba273
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes