You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Matthew Jacobs (Code Review)" <ge...@cloudera.org> on 2016/10/21 06:07:43 UTC

[Impala-ASF-CR] IMPALA-3788: Support for Kudu 'read-your-writes' consistency

Matthew Jacobs has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4779

Change subject: IMPALA-3788: Support for Kudu 'read-your-writes' consistency
......................................................................

IMPALA-3788: Support for Kudu 'read-your-writes' consistency

Kudu provides an API to get/set a 'latest observed
timestamp' on clients to allow a client which inserts to
capture and send this timestamp to another client before a
read to ensure that data as of that timestamp is visible.
This adds support for this feature _for reads within a
session_ by capturing the latest observed timestamp when the
KuduTableSink is sending its last update to the coordinator.
The timestamp is sent with other post-write information, and
is aggregated (i.e. taking the max) at the coordinator. The
max is stored in the session, and that value is then set in
the Kudu client on future scans.

This is being tested by running the Kudu tests after
removing delays that were introduced to work around the
issue that reads might not be visible after a write. Before
this change, if there were no delay, inconsistent results
could be returned.

Change-Id: I6bcb5fc218ad4ab935343a55b2188441d8c7dfbd
---
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-table-sink.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/query-exec-state.cc
M common/thrift/ImpalaInternalService.thrift
M tests/common/impala_test_suite.py
10 files changed, 53 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/4779/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I6bcb5fc218ad4ab935343a55b2188441d8c7dfbd
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs <mj...@cloudera.com>

[Impala-ASF-CR] IMPALA-3788: Support for Kudu 'read-your-writes' consistency

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3788: Support for Kudu 'read-your-writes' consistency
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4779/3/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 1163:   if (kudu_latest_observed_ts > 0) {
> why not carry the > 0 convention and just always set the session state?  lo
Done


http://gerrit.cloudera.org:8080/#/c/4779/3/be/src/service/query-exec-state.cc
File be/src/service/query-exec-state.cc:

Line 547:     }
> why is it better to do here rather than, say, ImpalaServer::UnregisterQuery
I think this could go in UnregisterQuery too. I put it here because I thought it made more sense to do some post-query accounting in QES rather than UnregisterQuery(), which feels like it should be focusing on unregistering (not that these we are particularly consistent to begin with...).

It shouldn't matter if lock_ is held or not. I should be able to call this after BlockOnWait(), or I can move it to Unregister if you prefer.


-- 
To view, visit http://gerrit.cloudera.org:8080/4779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6bcb5fc218ad4ab935343a55b2188441d8c7dfbd
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3788: Support for Kudu 'read-your-writes' consistency

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3788: Support for Kudu 'read-your-writes' consistency
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4779/3/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

Line 1163:   if (kudu_latest_observed_ts > 0) {
why not carry the > 0 convention and just always set the session state?  looks like scan node already checks > 0 anyway.


http://gerrit.cloudera.org:8080/#/c/4779/3/be/src/service/query-exec-state.cc
File be/src/service/query-exec-state.cc:

Line 547:     }
why is it better to do here rather than, say, ImpalaServer::UnregisterQuery()? is it important that we're holding QES::lock_?


http://gerrit.cloudera.org:8080/#/c/4779/3/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

Line 322
nice to see this go


-- 
To view, visit http://gerrit.cloudera.org:8080/4779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6bcb5fc218ad4ab935343a55b2188441d8c7dfbd
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3788: Support for Kudu 'read-your-writes' consistency

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3788: Support for Kudu 'read-your-writes' consistency
......................................................................


IMPALA-3788: Support for Kudu 'read-your-writes' consistency

Kudu provides an API to get/set a 'latest observed
timestamp' on clients to allow a client which inserts to
capture and send this timestamp to another client before a
read to ensure that data as of that timestamp is visible.
This adds support for this feature _for reads within a
session_ by capturing the latest observed timestamp when the
KuduTableSink is sending its last update to the coordinator.
The timestamp is sent with other post-write information, and
is aggregated (i.e. taking the max) at the coordinator. The
max is stored in the session, and that value is then set in
the Kudu client on future scans.

This is being tested by running the Kudu tests after
removing delays that were introduced to work around the
issue that reads might not be visible after a write. Before
this change, if there were no delay, inconsistent results
could be returned.

Change-Id: I6bcb5fc218ad4ab935343a55b2188441d8c7dfbd
Reviewed-on: http://gerrit.cloudera.org:8080/4779
Reviewed-by: Matthew Jacobs <mj...@cloudera.com>
Tested-by: Internal Jenkins
---
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-table-sink.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/query-exec-state.cc
M common/thrift/ImpalaInternalService.thrift
M tests/common/impala_test_suite.py
M tests/query_test/test_kudu.py
11 files changed, 51 insertions(+), 7 deletions(-)

Approvals:
  Matthew Jacobs: Looks good to me, approved
  Internal Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/4779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I6bcb5fc218ad4ab935343a55b2188441d8c7dfbd
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>

[Impala-ASF-CR] IMPALA-3788: Support for Kudu 'read-your-writes' consistency

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has uploaded a new patch set (#3).

Change subject: IMPALA-3788: Support for Kudu 'read-your-writes' consistency
......................................................................

IMPALA-3788: Support for Kudu 'read-your-writes' consistency

Kudu provides an API to get/set a 'latest observed
timestamp' on clients to allow a client which inserts to
capture and send this timestamp to another client before a
read to ensure that data as of that timestamp is visible.
This adds support for this feature _for reads within a
session_ by capturing the latest observed timestamp when the
KuduTableSink is sending its last update to the coordinator.
The timestamp is sent with other post-write information, and
is aggregated (i.e. taking the max) at the coordinator. The
max is stored in the session, and that value is then set in
the Kudu client on future scans.

This is being tested by running the Kudu tests after
removing delays that were introduced to work around the
issue that reads might not be visible after a write. Before
this change, if there were no delay, inconsistent results
could be returned.

Change-Id: I6bcb5fc218ad4ab935343a55b2188441d8c7dfbd
---
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-table-sink.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/query-exec-state.cc
M common/thrift/ImpalaInternalService.thrift
M tests/common/impala_test_suite.py
M tests/query_test/test_kudu.py
11 files changed, 55 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/4779/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6bcb5fc218ad4ab935343a55b2188441d8c7dfbd
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs <mj...@cloudera.com>

[Impala-ASF-CR] IMPALA-3788: Support for Kudu 'read-your-writes' consistency

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3788: Support for Kudu 'read-your-writes' consistency
......................................................................


Patch Set 4: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4779/3/be/src/service/query-exec-state.cc
File be/src/service/query-exec-state.cc:

Line 547:     }
> I think this could go in UnregisterQuery too. I put it here because I thoug
I don't have a strong preference either way -- mostly wondering what the reasoning was.  The division between the two don't seem very clear (i.e. its not clear why some stuff in Unregister() isn't in Done()), but I agree Unregister() doesn't seem any better.


-- 
To view, visit http://gerrit.cloudera.org:8080/4779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6bcb5fc218ad4ab935343a55b2188441d8c7dfbd
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3788: Support for Kudu 'read-your-writes' consistency

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-3788: Support for Kudu 'read-your-writes' consistency
......................................................................


Patch Set 5: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6bcb5fc218ad4ab935343a55b2188441d8c7dfbd
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3788: Support for Kudu 'read-your-writes' consistency

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3788: Support for Kudu 'read-your-writes' consistency
......................................................................


Patch Set 5: Code-Review+2

rebase

-- 
To view, visit http://gerrit.cloudera.org:8080/4779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6bcb5fc218ad4ab935343a55b2188441d8c7dfbd
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3788: Support for Kudu 'read-your-writes' consistency

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has uploaded a new patch set (#2).

Change subject: IMPALA-3788: Support for Kudu 'read-your-writes' consistency
......................................................................

IMPALA-3788: Support for Kudu 'read-your-writes' consistency

Kudu provides an API to get/set a 'latest observed
timestamp' on clients to allow a client which inserts to
capture and send this timestamp to another client before a
read to ensure that data as of that timestamp is visible.
This adds support for this feature _for reads within a
session_ by capturing the latest observed timestamp when the
KuduTableSink is sending its last update to the coordinator.
The timestamp is sent with other post-write information, and
is aggregated (i.e. taking the max) at the coordinator. The
max is stored in the session, and that value is then set in
the Kudu client on future scans.

This is being tested by running the Kudu tests after
removing delays that were introduced to work around the
issue that reads might not be visible after a write. Before
this change, if there were no delay, inconsistent results
could be returned.

Change-Id: I6bcb5fc218ad4ab935343a55b2188441d8c7dfbd
---
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-table-sink.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/query-exec-state.cc
M common/thrift/ImpalaInternalService.thrift
M tests/common/impala_test_suite.py
10 files changed, 53 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/4779/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6bcb5fc218ad4ab935343a55b2188441d8c7dfbd
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs <mj...@cloudera.com>

[Impala-ASF-CR] IMPALA-3788: Support for Kudu 'read-your-writes' consistency

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has uploaded a new patch set (#4).

Change subject: IMPALA-3788: Support for Kudu 'read-your-writes' consistency
......................................................................

IMPALA-3788: Support for Kudu 'read-your-writes' consistency

Kudu provides an API to get/set a 'latest observed
timestamp' on clients to allow a client which inserts to
capture and send this timestamp to another client before a
read to ensure that data as of that timestamp is visible.
This adds support for this feature _for reads within a
session_ by capturing the latest observed timestamp when the
KuduTableSink is sending its last update to the coordinator.
The timestamp is sent with other post-write information, and
is aggregated (i.e. taking the max) at the coordinator. The
max is stored in the session, and that value is then set in
the Kudu client on future scans.

This is being tested by running the Kudu tests after
removing delays that were introduced to work around the
issue that reads might not be visible after a write. Before
this change, if there were no delay, inconsistent results
could be returned.

Change-Id: I6bcb5fc218ad4ab935343a55b2188441d8c7dfbd
---
M be/src/exec/kudu-scan-node.cc
M be/src/exec/kudu-table-sink.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/service/query-exec-state.cc
M common/thrift/ImpalaInternalService.thrift
M tests/common/impala_test_suite.py
M tests/query_test/test_kudu.py
11 files changed, 51 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/4779/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6bcb5fc218ad4ab935343a55b2188441d8c7dfbd
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>