You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/08/01 20:17:00 UTC
[jira] [Commented] (IMPALA-8780) Implementation of BufferedPlanRootSink where FlushFinal blocks until all rows are fetched

    [ https://issues.apache.org/jira/browse/IMPALA-8780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898330#comment-16898330 ] 

ASF subversion and git services commented on IMPALA-8780:
---------------------------------------------------------

Commit 699450aadbf45f36617472b7c777dc2d9aad066a in impala's branch refs/heads/master from Sahil Takiar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=699450a ]

IMPALA-8779, IMPALA-8780: RowBatchQueue re-factoring and BufferedPRS impl

Improves the encapsulation of RowBatchQueue by the doing the following
re-factoring:
* Renames RowBatchQueue to BlockingRowBatchQueue which is more
indicitive of what the queue does
* Re-factors the timers managed by the scan-node into the
BlockingRowBatchQueue implementation
* Favors composition over inheritance by re-factoring
BlockingRowBatchQueue to own a BlockingQueue rather than extending one

The re-factoring lays the groundwork for introducing a generic
RowBatchQueue that all RowBatch queues inherit from.

Adds a new DequeRowBatchQueue which is a simple wrapper around a
std::deque that (1) stores unique_ptr to queued RowBatch-es and (2)
has a maximum capacity.

Implements BufferedPlanRootSink using the new DequeRowBatchQueue.
DequeRowBatchQueue is generic enough that replacing it with a
SpillableQueue (queue backed by a BufferedTupleStream) should be
straightforward. BufferedPlanRootSink is synchronized to protect access
to DequeRowBatchQueue since the queue is not thread safe.

BufferedPlanRootSink FlushFinal blocks until the consumer thread has
processed all RowBatches. This ensures that the coordinator fragment
stays alive until all results are fetched, but allows all other
fragments to be shutdown immediately.

Testing:
* Running core tests
* Updated tests/query_test/test_result_spooling.py

Change-Id: I9b1bb4b9c6f6e92c70e8fbee6ccdf48c2f85b7be
Reviewed-on: http://gerrit.cloudera.org:8080/13883
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Implementation of BufferedPlanRootSink where FlushFinal blocks until all rows are fetched
> -----------------------------------------------------------------------------------------
>
>                 Key: IMPALA-8780
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8780
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>
> Implement {{BufferedPlanRootSink}} so that {{FlushFinal}} blocks until all rows are fetched. The implementation should use the {{RowBatchQueue}} introduced by IMPALA-8779. By blocking in {{FlushFinal}} all non-coordinator fragments will be closed if all results fit in the {{RowBatchQueue}}. {{BufferedPlanRootSink::Send}} should enqueue each given {{RowBatch}} onto the queue and then return. If the queue is full, it should block until there is more space left in the queue. {{BufferedPlanRootSink::GetNext}} reads from the queue and then fills in the given {{QueryResultSet}} by using the {{DataSink}} {{ScalarExprEvaluator}}-s. Since the producer thread can call {{BufferedPlanRootSink::Close}} while the consumer is calling {{BufferedPlanRootSink::GetNext}} the two methods need to be synchronized so that the {{DataSink}} {{MemTracker}}-s are not closed while {{GetNext}} is running.
> The implementation of {{BufferedPlanRootSink}} should remain the same regardless of whether a {{std::queue}} backed {{RowBatchQueue}} or a {{BufferedTupleStream}} backed {{RowBatchQueue}} is used.
> {{BufferedPlanRootSink}} and {{BlockingPlanRootSink}} are similar in the sense that {{BlockingPlanRootSink}} buffers one {{RowBatch}}, so for queries that return under 1024 rows, all non-coordinator fragments are closed immediately as well. The advantage of {{BufferedPlanRootSink}} is that allows buffering for 1+ {{RowBatch}}-es.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org