You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Will Berkeley (Code Review)" <ge...@cloudera.org> on 2019/06/07 20:10:22 UTC

[kudu-CR] [java] Favor column ids over column names in scan tokens

Will Berkeley has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13562


Change subject: [java] Favor column ids over column names in scan tokens
......................................................................

[java] Favor column ids over column names in scan tokens

Previously, a scan token would use column name to map a column in its
projection to a column in the target table's current schema.
Therefore, a scan token couldn't be used if a column were renamed
between when the token is cut and when it is rehydrated into a scanner.
This changes scan tokens in the Java client to hold column ids, if they
are available, and to use them when being rehydrated into a scanner, if
they are available. The original by-name behavior is the fallback if ids
are not available.

A follow-up will add similar capability to the C++ client.

Change-Id: Ib3f05a4175c7e7bfaec2cbd3586723e6de3823f0
---
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java
M src/kudu/client/client.proto
3 files changed, 86 insertions(+), 47 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/62/13562/1
-- 
To view, visit http://gerrit.cloudera.org:8080/13562
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib3f05a4175c7e7bfaec2cbd3586723e6de3823f0
Gerrit-Change-Number: 13562
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>

[kudu-CR] [java] Favor column ids over column names in scan tokens

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/13562 )

Change subject: [java] Favor column ids over column names in scan tokens
......................................................................


Patch Set 6: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/13562
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib3f05a4175c7e7bfaec2cbd3586723e6de3823f0
Gerrit-Change-Number: 13562
Gerrit-PatchSet: 6
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 13 Jun 2019 23:29:57 +0000
Gerrit-HasComments: No

[kudu-CR] [java] Favor column ids over column names in scan tokens

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13562

to look at the new patch set (#5).

Change subject: [java] Favor column ids over column names in scan tokens
......................................................................

[java] Favor column ids over column names in scan tokens

Previously, a scan token would use column name to map a column in its
projection to a column in the target table's current schema.
Therefore, a scan token couldn't be used if a column were renamed
between when the token is cut and when it is rehydrated into a scanner.
This adjusts the Java client to prefer ids to names, to fix this
behavior. Since this involves including column ids when serializing
columns to PBs as part of scan tokens, but the server does not permit
clients to send column ids in most cases, this patch adds a new
serialization option that includes column ids.

Note that this patch does not make _scanners_ resistant to column name
changes. If a scanner is opened against a table and a column name
changes on a replica before the scanner opens a server-side scanner on
it, the scan will fail if the column is in the projection.

A follow-up will add similar capability to the C++ client.

Change-Id: Ib3f05a4175c7e7bfaec2cbd3586723e6de3823f0
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AlterTableOptions.java
M java/kudu-client/src/main/java/org/apache/kudu/client/CreateTableRequest.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/main/java/org/apache/kudu/client/Operation.java
M java/kudu-client/src/main/java/org/apache/kudu/client/ProtobufHelper.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKeyEncoding.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java
7 files changed, 230 insertions(+), 67 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/62/13562/5
-- 
To view, visit http://gerrit.cloudera.org:8080/13562
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib3f05a4175c7e7bfaec2cbd3586723e6de3823f0
Gerrit-Change-Number: 13562
Gerrit-PatchSet: 5
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [java] Favor column ids over column names in scan tokens

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/13562 )

Change subject: [java] Favor column ids over column names in scan tokens
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13562/1/src/kudu/client/client.proto
File src/kudu/client/client.proto:

http://gerrit.cloudera.org:8080/#/c/13562/1/src/kudu/client/client.proto@58
PS1, Line 58:   repeated uint32 projected_column_ids = 21;
> So the C++ and Java Schema/ColumnSchema code agree on this:
Yeah that's probably a better way to do it. Im gonna have to plumb through all the SCHEMA_PB_{WITH|WITHOUT}_IDS stuff everywhere.



-- 
To view, visit http://gerrit.cloudera.org:8080/13562
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib3f05a4175c7e7bfaec2cbd3586723e6de3823f0
Gerrit-Change-Number: 13562
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Sat, 08 Jun 2019 02:11:20 +0000
Gerrit-HasComments: Yes

[kudu-CR] [java] Favor column ids over column names in scan tokens

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/13562 )

Change subject: [java] Favor column ids over column names in scan tokens
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13562/2/src/kudu/client/client.proto
File src/kudu/client/client.proto:

http://gerrit.cloudera.org:8080/#/c/13562/2/src/kudu/client/client.proto@58
PS2, Line 58:   repeated uint32 projected_column_ids = 21;
Please make sure this works with diff scans and virtual columns



-- 
To view, visit http://gerrit.cloudera.org:8080/13562
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib3f05a4175c7e7bfaec2cbd3586723e6de3823f0
Gerrit-Change-Number: 13562
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 07 Jun 2019 23:10:10 +0000
Gerrit-HasComments: Yes

[kudu-CR] [java] Favor column ids over column names in scan tokens

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13562 )

Change subject: [java] Favor column ids over column names in scan tokens
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13562/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java:

http://gerrit.cloudera.org:8080/#/c/13562/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java@162
PS1, Line 162:  
Extra space here.


http://gerrit.cloudera.org:8080/#/c/13562/1/src/kudu/client/client.proto
File src/kudu/client/client.proto:

http://gerrit.cloudera.org:8080/#/c/13562/1/src/kudu/client/client.proto@58
PS1, Line 58:   repeated uint32 projected_column_ids = 21;
ColumnSchemaPB already has an id field; why can't we just fill it?



-- 
To view, visit http://gerrit.cloudera.org:8080/13562
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib3f05a4175c7e7bfaec2cbd3586723e6de3823f0
Gerrit-Change-Number: 13562
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Fri, 07 Jun 2019 22:00:21 +0000
Gerrit-HasComments: Yes

[kudu-CR] [java] Favor column ids over column names in scan tokens

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/13562 )

Change subject: [java] Favor column ids over column names in scan tokens
......................................................................


Patch Set 2: Code-Review+1

I'm OK with this approach because it's a reversible decision if we so choose in the future and I would like to get this into 1.10


-- 
To view, visit http://gerrit.cloudera.org:8080/13562
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib3f05a4175c7e7bfaec2cbd3586723e6de3823f0
Gerrit-Change-Number: 13562
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 07 Jun 2019 23:08:48 +0000
Gerrit-HasComments: No

[kudu-CR] [java] Favor column ids over column names in scan tokens

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/13562 )

Change subject: [java] Favor column ids over column names in scan tokens
......................................................................


Patch Set 3:

The current revision has a known defect that will be addressed in a follow-up revision.


-- 
To view, visit http://gerrit.cloudera.org:8080/13562
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib3f05a4175c7e7bfaec2cbd3586723e6de3823f0
Gerrit-Change-Number: 13562
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 11 Jun 2019 20:09:03 +0000
Gerrit-HasComments: No

[kudu-CR] [java] Favor column ids over column names in scan tokens

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13562

to look at the new patch set (#4).

Change subject: [java] Favor column ids over column names in scan tokens
......................................................................

[java] Favor column ids over column names in scan tokens

Previously, a scan token would use column name to map a column in its
projection to a column in the target table's current schema.
Therefore, a scan token couldn't be used if a column were renamed
between when the token is cut and when it is rehydrated into a scanner.
This adjusts the Java client to prefer ids to names, to fix this
behavior. Since this involves including column ids when serializing
columns to PBs as part of scan tokens, but the server does not permit
clients to send column ids in most cases, this patch adds a new
serialization option that includes column ids.

Note that this patch does not make _scanners_ resistant to column name
changes. If a scanner is opened against a table and a column name
changes on a replica before the scanner opens a server-side scanner on
it, the scan will fail if the column is in the projection.

A follow-up will add similar capability to the C++ client.

Change-Id: Ib3f05a4175c7e7bfaec2cbd3586723e6de3823f0
---
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/main/java/org/apache/kudu/client/ProtobufHelper.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java
3 files changed, 213 insertions(+), 60 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/62/13562/4
-- 
To view, visit http://gerrit.cloudera.org:8080/13562
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib3f05a4175c7e7bfaec2cbd3586723e6de3823f0
Gerrit-Change-Number: 13562
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [java] Favor column ids over column names in scan tokens

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13562 )

Change subject: [java] Favor column ids over column names in scan tokens
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13562/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13562/4//COMMIT_MSG@19
PS4, Line 19: Note that this patch does not make _scanners_ resistant to column name
            : changes. If a scanner is opened against a table and a column name
            : changes on a replica before the scanner opens a server-side scanner on
            : it, the scan will fail if the column is in the projection.
Do you think we should address that too (separately)? Seems like unexpected behavior from the user's perspective, no?


http://gerrit.cloudera.org:8080/#/c/13562/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java:

http://gerrit.cloudera.org:8080/#/c/13562/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java@373
PS4, Line 373:                                     schema.hasColumnIds() ? schema.getColumnId(columnName) : -1,
Hmm, under what circumstances would the schema not have column IDs?


http://gerrit.cloudera.org:8080/#/c/13562/4/java/kudu-client/src/main/java/org/apache/kudu/client/ProtobufHelper.java
File java/kudu-client/src/main/java/org/apache/kudu/client/ProtobufHelper.java:

http://gerrit.cloudera.org:8080/#/c/13562/4/java/kudu-client/src/main/java/org/apache/kudu/client/ProtobufHelper.java@50
PS4, Line 50:     SCHEMA_PB_WITH_ID
This is confusing; could you also make this WITHOUT and update the "default" conversion paths to use it?



-- 
To view, visit http://gerrit.cloudera.org:8080/13562
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib3f05a4175c7e7bfaec2cbd3586723e6de3823f0
Gerrit-Change-Number: 13562
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 11 Jun 2019 23:38:41 +0000
Gerrit-HasComments: Yes

[kudu-CR] [java] Favor column ids over column names in scan tokens

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/13562 )

Change subject: [java] Favor column ids over column names in scan tokens
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13562/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13562/4//COMMIT_MSG@19
PS4, Line 19: Note that this patch does not make _scanners_ resistant to column name
            : changes. If a scanner is opened against a table and a column name
            : changes on a replica before the scanner opens a server-side scanner on
            : it, the scan will fail if the column is in the projection.
> Do you think we should address that too (separately)? Seems like unexpected
Yep.


http://gerrit.cloudera.org:8080/#/c/13562/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java:

http://gerrit.cloudera.org:8080/#/c/13562/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java@373
PS4, Line 373:                                     schema.hasColumnIds() ? schema.getColumnId(columnName) : -1,
> Hmm, under what circumstances would the schema not have column IDs?
At these callsite? Probably none because the schema comes from the table we just opened. I still think it's nice to handle it.


http://gerrit.cloudera.org:8080/#/c/13562/4/java/kudu-client/src/main/java/org/apache/kudu/client/ProtobufHelper.java
File java/kudu-client/src/main/java/org/apache/kudu/client/ProtobufHelper.java:

http://gerrit.cloudera.org:8080/#/c/13562/4/java/kudu-client/src/main/java/org/apache/kudu/client/ProtobufHelper.java@50
PS4, Line 50:     SCHEMA_PB_WITH_ID
> This is confusing; could you also make this WITHOUT and update the "default
I'm not sure how I feel about that. The name makes it clear what the flag does-- it's not as if true means include and false means exclude for comments and the opposite for IDs. This minimizing the extent of the changes and so minimizes the risk from them. Accidentally including the id can cause server-side RPC failures when the server rejects client schemas with ids.

If you/others still feel strongly about it I can change it.



-- 
To view, visit http://gerrit.cloudera.org:8080/13562
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib3f05a4175c7e7bfaec2cbd3586723e6de3823f0
Gerrit-Change-Number: 13562
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 13 Jun 2019 16:53:59 +0000
Gerrit-HasComments: Yes

[kudu-CR] [java] Favor column ids over column names in scan tokens

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/13562 )

Change subject: [java] Favor column ids over column names in scan tokens
......................................................................


Patch Set 5: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/13562
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib3f05a4175c7e7bfaec2cbd3586723e6de3823f0
Gerrit-Change-Number: 13562
Gerrit-PatchSet: 5
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 13 Jun 2019 20:54:38 +0000
Gerrit-HasComments: No

[kudu-CR] [java] Favor column ids over column names in scan tokens

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13562

to look at the new patch set (#2).

Change subject: [java] Favor column ids over column names in scan tokens
......................................................................

[java] Favor column ids over column names in scan tokens

Previously, a scan token would use column name to map a column in its
projection to a column in the target table's current schema.
Therefore, a scan token couldn't be used if a column were renamed
between when the token is cut and when it is rehydrated into a scanner.
This changes scan tokens in the Java client to hold column ids, if they
are available, and to use them when being rehydrated into a scanner, if
they are available. The original by-name behavior is the fallback if ids
are not available.

A follow-up will add similar capability to the C++ client.

Change-Id: Ib3f05a4175c7e7bfaec2cbd3586723e6de3823f0
---
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java
M src/kudu/client/client.proto
3 files changed, 86 insertions(+), 47 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/62/13562/2
-- 
To view, visit http://gerrit.cloudera.org:8080/13562
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib3f05a4175c7e7bfaec2cbd3586723e6de3823f0
Gerrit-Change-Number: 13562
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] [java] Favor column ids over column names in scan tokens

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13562 )

Change subject: [java] Favor column ids over column names in scan tokens
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13562/1/src/kudu/client/client.proto
File src/kudu/client/client.proto:

http://gerrit.cloudera.org:8080/#/c/13562/1/src/kudu/client/client.proto@58
PS1, Line 58:   repeated uint32 projected_column_ids = 21;
> Because a ColumnSchema doesn't have an id field, and the only purpose of ad
So the C++ and Java Schema/ColumnSchema code agree on this:
1. IDs are stored in a separate collection in Schema rather than in each ColumnSchema.
2. Some Schemas have IDs, others don't.

And yet, when we serialize C++ Schemas to SchemaPB, we also write out the Schema's IDs into the various ColumnSchemaPBs (assuming the schema had IDs and SCHEMA_PB_WITHOUT_IDS wasn't set). Likewise when we deserialize a SchemaPB. Why shouldn't we do the same in Java? It'll be quite confusing to look at projected_column_ids and at projected_columns.<id> and wonder which is used for what, especially from the C++ side where it's totally normal to use projected_columns.<id> in the manner I've described (because of the PB serde functions available in wire_protocol.h).



-- 
To view, visit http://gerrit.cloudera.org:8080/13562
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib3f05a4175c7e7bfaec2cbd3586723e6de3823f0
Gerrit-Change-Number: 13562
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 07 Jun 2019 23:10:26 +0000
Gerrit-HasComments: Yes

[kudu-CR] [java] Favor column ids over column names in scan tokens

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13562 )

Change subject: [java] Favor column ids over column names in scan tokens
......................................................................

[java] Favor column ids over column names in scan tokens

Previously, a scan token would use column name to map a column in its
projection to a column in the target table's current schema.
Therefore, a scan token couldn't be used if a column were renamed
between when the token is cut and when it is rehydrated into a scanner.
This adjusts the Java client to prefer ids to names, to fix this
behavior. Since this involves including column ids when serializing
columns to PBs as part of scan tokens, but the server does not permit
clients to send column ids in most cases, this patch adds a new
serialization option that includes column ids.

Note that this patch does not make _scanners_ resistant to column name
changes. If a scanner is opened against a table and a column name
changes on a replica before the scanner opens a server-side scanner on
it, the scan will fail if the column is in the projection.

A follow-up will add similar capability to the C++ client.

Change-Id: Ib3f05a4175c7e7bfaec2cbd3586723e6de3823f0
Reviewed-on: http://gerrit.cloudera.org:8080/13562
Reviewed-by: Mike Percy <mp...@apache.org>
Tested-by: Kudu Jenkins
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AlterTableOptions.java
M java/kudu-client/src/main/java/org/apache/kudu/client/CreateTableRequest.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/main/java/org/apache/kudu/client/Operation.java
M java/kudu-client/src/main/java/org/apache/kudu/client/ProtobufHelper.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKeyEncoding.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java
7 files changed, 229 insertions(+), 67 deletions(-)

Approvals:
  Mike Percy: Looks good to me, approved
  Kudu Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/13562
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ib3f05a4175c7e7bfaec2cbd3586723e6de3823f0
Gerrit-Change-Number: 13562
Gerrit-PatchSet: 7
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [java] Favor column ids over column names in scan tokens

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/13562 )

Change subject: [java] Favor column ids over column names in scan tokens
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13562/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java:

http://gerrit.cloudera.org:8080/#/c/13562/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java@162
PS1, Line 162:  
> Extra space here.
Done


http://gerrit.cloudera.org:8080/#/c/13562/1/src/kudu/client/client.proto
File src/kudu/client/client.proto:

http://gerrit.cloudera.org:8080/#/c/13562/1/src/kudu/client/client.proto@58
PS1, Line 58:   repeated uint32 projected_column_ids = 21;
> ColumnSchemaPB already has an id field; why can't we just fill it?
Because a ColumnSchema doesn't have an id field, and the only purpose of adding an optional one would be its use in scan tokens (right now). ColumnSchemas don't always have ids associated with them- like when the Schema is built client side. Also, in the Schema class itself, the ids are a separate (possibly null) collection, so this mirrors that setup.



-- 
To view, visit http://gerrit.cloudera.org:8080/13562
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib3f05a4175c7e7bfaec2cbd3586723e6de3823f0
Gerrit-Change-Number: 13562
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 07 Jun 2019 22:14:11 +0000
Gerrit-HasComments: Yes

[kudu-CR] [java] Favor column ids over column names in scan tokens

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13562 )

Change subject: [java] Favor column ids over column names in scan tokens
......................................................................


Patch Set 5: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/13562
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib3f05a4175c7e7bfaec2cbd3586723e6de3823f0
Gerrit-Change-Number: 13562
Gerrit-PatchSet: 5
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 13 Jun 2019 18:53:47 +0000
Gerrit-HasComments: No

[kudu-CR] [java] Favor column ids over column names in scan tokens

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13562 )

Change subject: [java] Favor column ids over column names in scan tokens
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13562/4/java/kudu-client/src/main/java/org/apache/kudu/client/ProtobufHelper.java
File java/kudu-client/src/main/java/org/apache/kudu/client/ProtobufHelper.java:

http://gerrit.cloudera.org:8080/#/c/13562/4/java/kudu-client/src/main/java/org/apache/kudu/client/ProtobufHelper.java@50
PS4, Line 50:     SCHEMA_PB_WITH_ID
> I'm not sure how I feel about that. The name makes it clear what the flag d
My argument is one of consistency with the existing schema PB conversion flag, and consistency with the equivalent C++ code (which defines all of its flags in terms of "without" vs. "with").

I agree that it increases risk in _this_ change, but it makes Kudu's wire protocol schema conversions overall more readable.



-- 
To view, visit http://gerrit.cloudera.org:8080/13562
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib3f05a4175c7e7bfaec2cbd3586723e6de3823f0
Gerrit-Change-Number: 13562
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 13 Jun 2019 17:01:45 +0000
Gerrit-HasComments: Yes

[kudu-CR] [java] Favor column ids over column names in scan tokens

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13562

to look at the new patch set (#3).

Change subject: [java] Favor column ids over column names in scan tokens
......................................................................

[java] Favor column ids over column names in scan tokens

Previously, a scan token would use column name to map a column in its
projection to a column in the target table's current schema.
Therefore, a scan token couldn't be used if a column were renamed
between when the token is cut and when it is rehydrated into a scanner.
This adjusts the Java client to prefer ids to names, to fix this
behavior.

A follow-up will add similar capability to the C++ client.

Change-Id: Ib3f05a4175c7e7bfaec2cbd3586723e6de3823f0
---
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/main/java/org/apache/kudu/client/ProtobufHelper.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestScanToken.java
3 files changed, 101 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/62/13562/3
-- 
To view, visit http://gerrit.cloudera.org:8080/13562
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib3f05a4175c7e7bfaec2cbd3586723e6de3823f0
Gerrit-Change-Number: 13562
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [java] Favor column ids over column names in scan tokens

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/13562 )

Change subject: [java] Favor column ids over column names in scan tokens
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13562/4/java/kudu-client/src/main/java/org/apache/kudu/client/ProtobufHelper.java
File java/kudu-client/src/main/java/org/apache/kudu/client/ProtobufHelper.java:

http://gerrit.cloudera.org:8080/#/c/13562/4/java/kudu-client/src/main/java/org/apache/kudu/client/ProtobufHelper.java@50
PS4, Line 50:     SCHEMA_PB_WITH_ID
> My argument is one of consistency with the existing schema PB conversion fl
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/13562
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib3f05a4175c7e7bfaec2cbd3586723e6de3823f0
Gerrit-Change-Number: 13562
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 13 Jun 2019 18:06:32 +0000
Gerrit-HasComments: Yes