You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "John Sherman (Code Review)" <ge...@cloudera.org> on 2021/03/01 17:08:56 UTC

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Hello Aman Sinha,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/17144

to review the following change.


Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................

IMPALA-10551: Add result sink support for external frontends

- The intended purpose of these changes is to allow external frontends
  to receive query results via files rather than streaming the results
  through the thrift interface.
- External frontends are expected to provide an FeFsTable implementation
  that describes the desired location to store results.
- External frontends are responsible for managing the files after the
  query is completed.
- Testing has been manual and through an implementation of an external
  frontend.

Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Reviewed-by: Aman Sinha <am...@cloudera.com>
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
9 files changed, 93 insertions(+), 15 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17144/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 1
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17144/5/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/17144/5/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@245
PS5, Line 245:       exprs.addAll(outputExprs_.subList(0, targetTable_.getNonClusteringColumns().size()));
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 5
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Mar 2021 02:17:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
Hello Aman Sinha, Thomas Tauber-Marshall, Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17144

to look at the new patch set (#2).

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................

IMPALA-10551: Add result sink support for external frontends

- The intended purpose of these changes is to allow external frontends
  to receive query results via files rather than streaming the results
  through the thrift interface.
- External frontends are expected to provide an FeFsTable implementation
  that describes the desired location to store results.
- External frontends are responsible for managing the files after the
  query is completed.
- Testing has been manual and through an implementation of an external
  frontend.

Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Reviewed-by: Aman Sinha <am...@cloudera.com>
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
9 files changed, 111 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17144/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 2
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17144/3/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/17144/3/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@245
PS3, Line 245:       exprs.addAll(outputExprs_.subList(0, targetTable_.getNonClusteringColumns().size()));
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 3
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Sun, 07 Mar 2021 16:37:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/17144/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17144/1//COMMIT_MSG@14
PS1, Line 14: - External frontends are responsible for managing the files after the
            :   query is completed.
> I wonder how this would interact with the "retry_failed_queries" query opti
I think the newest version of the patch doesn't try to retry failed queries when there is a result sink.


http://gerrit.cloudera.org:8080/#/c/17144/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17144/1/be/src/runtime/coordinator.cc@798
PS1, Line 798: Coordinator::FinalizeResultSink()
> I might be missing something, but where is this called?
Great catch - I think some code got lost in a rebase. Will show up in next patch (in Wait()).


http://gerrit.cloudera.org:8080/#/c/17144/1/be/src/runtime/coordinator.cc@907
PS1, Line 907:   } else if (stmt_type_ == TStmtType::QUERY) {
             :     DCHECK(coord_instance_ != nullptr);
> I'm thinking through the implication of calling this after WaitForBackends(
I think a delay in reporting an error is okay for now (As long as that is the only negative in thee way I am doing this)

I will also take some time to think it through a bit since I haven't thought about this code for awhile.


http://gerrit.cloudera.org:8080/#/c/17144/2/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17144/2/be/src/runtime/coordinator.cc@897
PS2, Line 897:     // than streaming them to a client directly. For these types of queries we need to wait
> line too long (91 > 90)
I'll catch in next round of fixes.


http://gerrit.cloudera.org:8080/#/c/17144/1/be/src/runtime/fragment-instance-state.h
File be/src/runtime/fragment-instance-state.h:

http://gerrit.cloudera.org:8080/#/c/17144/1/be/src/runtime/fragment-instance-state.h@121
PS1, Line 121:   JoinBuilder* GetJoinBuildSink() const;
> Add CR
Done


http://gerrit.cloudera.org:8080/#/c/17144/1/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/17144/1/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@245
PS1, Line 245:       exprs.addAll(outputExprs_.subList(0,
> line too long (91 > 90)
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 2
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 02 Mar 2021 00:23:26 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17144/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17144/1//COMMIT_MSG@14
PS1, Line 14: - External frontends are responsible for managing the files after the
            :   query is completed.
I wonder how this would interact with the "retry_failed_queries" query option. That can reexecute a failed statement if no rows were fetched. Depending on how files are cleaned up, this could be a problem. We may want to throw an error if using this result sink when retry_failed_queries = True until we can test it.


http://gerrit.cloudera.org:8080/#/c/17144/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17144/1/be/src/runtime/coordinator.cc@798
PS1, Line 798: Coordinator::FinalizeResultSink()
I might be missing something, but where is this called?


http://gerrit.cloudera.org:8080/#/c/17144/1/be/src/runtime/coordinator.cc@907
PS1, Line 907:     RETURN_IF_ERROR(UpdateExecState(coord_instance_->WaitForOpen(),
             :         &coord_instance_->runtime_state()->fragment_instance_id(), FLAGS_hostname));
I'm thinking through the implication of calling this after WaitForBackends() for QUERY statements. I think it would delay the detection of certain types of errors, but I'm not sure if there are other problems.



-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 1
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Mon, 01 Mar 2021 23:01:42 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
Hello Aman Sinha, Thomas Tauber-Marshall, Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17144

to look at the new patch set (#6).

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................

IMPALA-10551: Add result sink support for external frontends

- The intended purpose of these changes is to allow external frontends
  to receive query results via files rather than streaming the results
  through the thrift interface.
- External frontends are expected to provide an FeFsTable implementation
  that describes the desired location to store results.
- External frontends are responsible for managing the files after the
  query is completed.
- Testing has been manual and through an implementation of an external
  frontend.

Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Reviewed-by: Aman Sinha <am...@cloudera.com>
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/query-exec-params.cc
M be/src/runtime/query-exec-params.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
9 files changed, 97 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17144/6
-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 6
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
Hello Aman Sinha, Thomas Tauber-Marshall, Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17144

to look at the new patch set (#4).

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................

IMPALA-10551: Add result sink support for external frontends

- The intended purpose of these changes is to allow external frontends
  to receive query results via files rather than streaming the results
  through the thrift interface.
- External frontends are expected to provide an FeFsTable implementation
  that describes the desired location to store results.
- External frontends are responsible for managing the files after the
  query is completed.
- Testing has been manual and through an implementation of an external
  frontend.

Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Reviewed-by: Aman Sinha <am...@cloudera.com>
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
9 files changed, 113 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17144/4
-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 4
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................


Patch Set 3:

(7 comments)

My general thought is that a query with a result sink is a lot more like a DML than a query, so it would be useful to reused that codepath with some minor changes.

These comments are a bit disorganized, so let me know if the idea isn't really coming through.

http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.h@308
PS3, Line 308:   /// Non-null if and only if the query produces results for the client; i.e. is of
             :   /// TStmtType::QUERY. Coordinator uses these to pull results from plan tree and return
             :   /// them to the client in GetNext(), and also to access the fragment instance's runtime
             :   /// state.
             :   ///
             :   /// Result rows are materialized by this fragment instance in its own thread. They are
             :   /// materialized into a QueryResultSet provided to the coordinator during GetNext().
             :   ///
             :   /// Owned by the QueryState. Set in Exec().
             :   FragmentInstanceState* coord_instance_ = nullptr;
The way we are using coord_instance_ for a query with a result sink doesn't match this comment.

I'm thinking coord_instance_ should probably remain null when doing a QUERY with a result sink. The client wouldn't be getting rows out of GetNext(). Having coord_instance_ == nullptr doesn't mean there is no fragment on the coordinator. It just means that we don't really use it in coordinator.cc.

We currently only use coord_instance_ as a place to store whether this has a result sink. Maybe we could pass that fact in separately.


http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@193
PS3, Line 193:   // set coord_instance_ and coord_sink_
             :   if (exec_params_.GetCoordFragment() != nullptr) {
             :     // this blocks until all fragment instances have finished their Prepare phase
             :     Status query_status = query_state_->GetFInstanceState(query_id(), &coord_instance_);
             :     if (!query_status.ok()) return UpdateExecState(query_status, nullptr, FLAGS_hostname);
             :     // We expected this query to have a coordinator instance.
             :     DCHECK(coord_instance_ != nullptr);
             :     // When writing results to file(s), we do not setup a coord_sink
             :     if (!coord_instance_->HasResultSink()) {
             :       // When GetFInstanceState() returns the coordinator instance, the Prepare phase is
             :       // done and the FragmentInstanceState's root sink will be set up.
             :       coord_sink_ = coord_instance_->GetRootSink();
             :       DCHECK(coord_sink_ != nullptr);
             :     }
             :   }
As mentioned in coordinator.h, I think coord_instance_ should be nullptr for a query with a result sink, so it probably makes more sense for QueryExecParams::GetCoordFragment() to return nullptr when there is a result sink.


http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@798
PS3, Line 798: Status Coordinator::FinalizeResultSink() {
I would prefer to review this when I can see how it is being called.


http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@805
PS3, Line 805:   RETURN_IF_ERROR(UpdateExecState(Status::OK(), nullptr, FLAGS_hostname));
If this returns an error (which it should if the query is not successful), it won't continue to the hdfs_table->ReleaseResources() below and do cleanup.


http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@894
PS3, Line 894:   // DML finalization can only happen when all backends have completed all side-effects
             :   // and reported relevant state. We also wait for backends to complete if we are writing
             :   // results to a filesystem.
             :   // External frontends can request to have query results written to a filesystem rather
             :   // than streaming them to a client directly. For these types of queries we need to wait
             :   // for all backends to complete.
             :   if (stmt_type_ == TStmtType::DML ||
             :          (stmt_type_ == TStmtType::QUERY && coord_instance_->HasResultSink())) {
             :     WaitForBackends();
             :   }
If the below if statement does not apply to queries with result sinks, then this can move back to where it was before with a modified DCHECK.


http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@905
PS3, Line 905:   if (stmt_type_ == TStmtType::QUERY) {
I think it would make sense to treat a query with a result sink like a DML rather than a QUERY. So, I would make this if block only execute for a query where there is not a result sink.

Then it would fall through to the DML case, which would be broadened to be DML + QUERY with result sink.


http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@940
PS3, Line 940:  || coord_instance_->HasResultSink()
If a query with a result sink goes through the current DML path, then it would execute SetNonErrorTerminalState with RETURNED_RESULTS. That would satisfy the other condition on this if, so this would not be needed.

Without this change, a successful execution never transitions to a terminal state. DMLs transition in Wait(). Queries transition in GetNext() when *eos=true. This does neither, so it never moves from EXECUTING to RETURNED_RESULTS.



-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 3
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 Mar 2021 00:38:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................


Patch Set 7:

> (1 comment)
 > 
 > Apart from the interaction with retry_failed_queries, this looks
 > good to me.
 > 
 > If you want to split off the retry_failed_queries piece into a
 > separate change, I am ok with that. I think this change is ok if
 > the external frontend guarantees not to pass in retry_failed_queries=true
 > when using a result sink.

I think it would be better to follow up. I've created https://issues.apache.org/jira/browse/IMPALA-10586 to track that work.


-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 7
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 Mar 2021 21:37:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8267/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 1
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Mon, 01 Mar 2021 18:52:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
Hello Aman Sinha, Thomas Tauber-Marshall, Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17144

to look at the new patch set (#3).

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................

IMPALA-10551: Add result sink support for external frontends

- The intended purpose of these changes is to allow external frontends
  to receive query results via files rather than streaming the results
  through the thrift interface.
- External frontends are expected to provide an FeFsTable implementation
  that describes the desired location to store results.
- External frontends are responsible for managing the files after the
  query is completed.
- Testing has been manual and through an implementation of an external
  frontend.

Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Reviewed-by: Aman Sinha <am...@cloudera.com>
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
9 files changed, 93 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17144/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 3
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17144/1/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

http://gerrit.cloudera.org:8080/#/c/17144/1/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java@245
PS1, Line 245:       exprs.addAll(outputExprs_.subList(0, targetTable_.getNonClusteringColumns().size()));
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 1
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Mon, 01 Mar 2021 17:09:42 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17144/2/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17144/2/be/src/runtime/coordinator.cc@897
PS2, Line 897:     // than streaming them to a client directly. For these types of queries we need to wait
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 2
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 02 Mar 2021 00:19:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8337/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 6
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Mar 2021 02:42:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17144/1/be/src/runtime/fragment-instance-state.h
File be/src/runtime/fragment-instance-state.h:

http://gerrit.cloudera.org:8080/#/c/17144/1/be/src/runtime/fragment-instance-state.h@121
PS1, Line 121:   JoinBuilder* GetJoinBuildSink() const;
Add CR



-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 1
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Comment-Date: Mon, 01 Mar 2021 17:09:02 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................


Patch Set 7: Code-Review+2

> > (1 comment)
 > >
 > > Apart from the interaction with retry_failed_queries, this looks
 > > good to me.
 > >
 > > If you want to split off the retry_failed_queries piece into a
 > > separate change, I am ok with that. I think this change is ok if
 > > the external frontend guarantees not to pass in retry_failed_queries=true
 > > when using a result sink.
 > 
 > I think it would be better to follow up. I've created
 > https://issues.apache.org/jira/browse/IMPALA-10586 to track that
 > work.

Ok, then this looks good to me.


-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 7
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 Mar 2021 21:39:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................

IMPALA-10551: Add result sink support for external frontends

- The intended purpose of these changes is to allow external frontends
  to receive query results via files rather than streaming the results
  through the thrift interface.
- External frontends are expected to provide an FeFsTable implementation
  that describes the desired location to store results.
- External frontends are responsible for managing the files after the
  query is completed.
- Testing has been manual and through an implementation of an external
  frontend.

Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Reviewed-by: Aman Sinha <am...@cloudera.com>
Reviewed-on: http://gerrit.cloudera.org:8080/17144
Reviewed-by: Joe McDonnell <jo...@cloudera.com>
Tested-by: Joe McDonnell <jo...@cloudera.com>
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/query-exec-params.cc
M be/src/runtime/query-exec-params.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
9 files changed, 99 insertions(+), 12 deletions(-)

Approvals:
  Joe McDonnell: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 8
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc@796
PS6, Line 796:   // All instances must have reported their final statuses before finalization, which is a
             :   // post-condition of Wait. Result sink file clean up is the responsibility of the
             :   // external frontend 
> This is a copy/paste from FinalizeHdfsDml(), so the second sentence doesn't
Done


http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc@802
PS6, Line 802:   RETURN_IF_ERROR(UpdateExecState(Status::OK(), nullptr, FLAGS_hostname));
> If there is an error from execution, it would show up here and this would r
I agree that retry_failed_queries might be problematic and we might want to recommend that an external frontend not enable the feature.

If I am reading the coordinator.cc code correct though, we do not retry queries with a result sink. Are there other areas I should be concerned about? I see some usage in query-driver.cc


http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc@807
PS6, Line 807:  
> Nit: This "0" is the table id. I'm guessing 0 is a special constant for the
I moved it to a named constant within this method since it is the only usage of it. I can also move it to DescriptorTbl if your prefer it there.


http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc@810
PS6, Line 810:         result_sink_table_id, obj_p
> I think this should be "0" (i.e. the special table id used in the CreateHdf
I removed the table_id portion from this message/



-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 7
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Mar 2021 23:43:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................


Patch Set 7: Verified+1

Ran GVO as a stack with the "external frontend CTAS support", and it passed, carrying +1 verified


-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 7
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Mar 2021 15:17:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8352/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 7
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Sat, 13 Mar 2021 00:02:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
Hello Aman Sinha, Thomas Tauber-Marshall, Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17144

to look at the new patch set (#7).

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................

IMPALA-10551: Add result sink support for external frontends

- The intended purpose of these changes is to allow external frontends
  to receive query results via files rather than streaming the results
  through the thrift interface.
- External frontends are expected to provide an FeFsTable implementation
  that describes the desired location to store results.
- External frontends are responsible for managing the files after the
  query is completed.
- Testing has been manual and through an implementation of an external
  frontend.

Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Reviewed-by: Aman Sinha <am...@cloudera.com>
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/query-exec-params.cc
M be/src/runtime/query-exec-params.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
9 files changed, 99 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17144/7
-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 7
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8274/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 2
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 02 Mar 2021 00:39:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................


Patch Set 6:

(4 comments)

This is making sense to me.

http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc@796
PS6, Line 796:   // All instances must have reported their final statuses before finalization, which is a
             :   // post-condition of Wait. If the query was not successful, still try to clean up the
             :   // staging directory.
This is a copy/paste from FinalizeHdfsDml(), so the second sentence doesn't mean that we actually thought that this would remove files in the case of errors.

Based on what we do today, we should change the second sentence to say that this does not clean up files in the event of an error. The external frontend handles that (or that is my assumption).


http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc@802
PS6, Line 802:   RETURN_IF_ERROR(UpdateExecState(Status::OK(), nullptr, FLAGS_hostname));
If there is an error from execution, it would show up here and this would return. The rest of this is the success case. It sounds intentional based on other comments, and if the external frontend is doing cleanup, this makes sense.

The only concern would be query retries on the Impala side (the "retry_failed_queries" query option). I'm a bit unclear about how retry_failed_queries would interact with an external frontend, but to be safe, an external frontend would want to make sure "retry_failed_queries" is not true for these types of queries until we can test/fix it. It might be worth adding a DCHECK somewhere that the retry_failed_queries query option is false when using the result sink.


http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc@807
PS6, Line 807: 0
Nit: This "0" is the table id. I'm guessing 0 is a special constant for the result sink? It would be nice to have this be a named constant.


http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc@810
PS6, Line 810:     << finalize_params()->table_id;
I think this should be "0" (i.e. the special table id used in the CreateHdfsTblDescriptor() call). I think this got carried over from FinalizeHdfsDml().



-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 6
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Mar 2021 19:17:39 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
Hello Aman Sinha, Thomas Tauber-Marshall, Joe McDonnell, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17144

to look at the new patch set (#5).

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................

IMPALA-10551: Add result sink support for external frontends

- The intended purpose of these changes is to allow external frontends
  to receive query results via files rather than streaming the results
  through the thrift interface.
- External frontends are expected to provide an FeFsTable implementation
  that describes the desired location to store results.
- External frontends are responsible for managing the files after the
  query is completed.
- Testing has been manual and through an implementation of an external
  frontend.

Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Reviewed-by: Aman Sinha <am...@cloudera.com>
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/runtime/query-exec-params.cc
M be/src/runtime/query-exec-params.h
M common/thrift/DataSinks.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
9 files changed, 96 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17144/5
-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 5
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................


Patch Set 7:

(1 comment)

Apart from the interaction with retry_failed_queries, this looks good to me.

If you want to split off the retry_failed_queries piece into a separate change, I am ok with that. I think this change is ok if the external frontend guarantees not to pass in retry_failed_queries=true when using a result sink.

http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc@802
PS6, Line 802:   RETURN_IF_ERROR(UpdateExecState(Status::OK(), nullptr, FLAGS_hostname));
> I agree that retry_failed_queries might be problematic and we might want to
The retrying happens at a level above the Coordinator object (at the QueryDriver level). Each new retry would have its own ClientRequestState object (which has its own Coordinator object). Here is a comment covering some of the concepts: https://github.com/apache/impala/blob/master/be/src/runtime/query-driver.h#L75

The pieces of code in Coordinator that would trigger a retry are the TryQueryRetry() calls in this file.

I believe we did not intend for the retries to apply to DMLs, but I tried it out and we do actually retry DMLs right now. So, at the moment, there is nothing disabling retries for something with a result sink. I will file a JIRA for disabling retry_failed_queries for DMLs. I think that was unintentional.



-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 7
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 Mar 2021 18:49:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8336/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 5
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Mar 2021 02:35:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8303/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 3
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Sun, 07 Mar 2021 16:56:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10551: Add result sink support for external frontends

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................


Patch Set 6:

(7 comments)

Thanks Joe, I think the suggested review comments has made this a better patch overall.

http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.h@308
PS3, Line 308:   /// Non-null if and only if the query produces results for the client; i.e. is of
             :   /// TStmtType::QUERY. Coordinator uses these to pull results from plan tree and return
             :   /// them to the client in GetNext(), and also to access the fragment instance's runtime
             :   /// state.
             :   ///
             :   /// Result rows are materialized by this fragment instance in its own thread. They are
             :   /// materialized into a QueryResultSet provided to the coordinator during GetNext().
             :   ///
             :   /// Owned by the QueryState. Set in Exec().
             :   FragmentInstanceState* coord_instance_ = nullptr;
> The way we are using coord_instance_ for a query with a result sink doesn't
I've added a method to query-exec-params and call it in a few places to determine if there is a result sink in coordinator.cc - not sure if that is the best way.


http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@193
PS3, Line 193:   // set coord_instance_ and coord_sink_
             :   if (exec_params_.GetCoordFragment() != nullptr) {
             :     // this blocks until all fragment instances have finished their Prepare phase
             :     Status query_status = query_state_->GetFInstanceState(query_id(), &coord_instance_);
             :     if (!query_status.ok()) return UpdateExecState(query_status, nullptr, FLAGS_hostname);
             :     // We expected this query to have a coordinator instance.
             :     DCHECK(coord_instance_ != nullptr);
             :     // When GetFInstanceState() returns the coordinator instance, the Prepare phase is
             :     // done and the FragmentInstanceState's root sink will be set up.
             :     coord_sink_ = coord_instance_->GetRootSink();
             :     DCHECK(coord_sink_ != nullptr);
             :   }
             :   return Status::OK();
             : }
             : 
> As mentioned in coordinator.h, I think coord_instance_ should be nullptr fo
Done


http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@798
PS3, Line 798:   // staging directory.
> I would prefer to review this when I can see how it is being called.
The newest review has the call.


http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@805
PS3, Line 805:   DCHECK(query_ctx().__isset.desc_tbl_serialized);
> If this returns an error (which it should if the query is not successful), 
This code is based on FinalizeHdfsDml which follows a similar pattern. So I wonder if something else ends up cleaning it up in the error case.


http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@894
PS3, Line 894:         && query_state_->query_options().spool_query_results
             :         && query_state_->query_options().spool_all_results_for_retries) {
             :       // Wait until the BufferedPlanRootSink spooled all results or any errors stopping
             :       // it, e.g. batch queue full, cancellation or failures.
             :       auto sink = static_cast<BufferedPlanRootSink*>(coord_sink_);
             :       if (sink->WaitForAllResultsSpooled()) {
             :         VLOG_QUERY << "Cannot spool all results in the allocated result spooling space."
             :             " Query retry will be skipped if any results have been returned.";
             :       }
             :    
> If the below if statement does not apply to queries with result sinks, then
Done


http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@905
PS3, Line 905:   }
> I think it would make sense to treat a query with a result sink like a DML 
Done


http://gerrit.cloudera.org:8080/#/c/17144/3/be/src/runtime/coordinator.cc@940
PS3, Line 940: 
> If a query with a result sink goes through the current DML path, then it wo
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 6
Gerrit-Owner: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Mar 2021 02:24:00 +0000
Gerrit-HasComments: Yes