You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Sahil Takiar (JIRA)" <ji...@apache.org> on 2019/08/01 19:56:00 UTC
[jira] [Commented] (IMPALA-8818) Replace deque queue with spillable queue in BufferedPlanRootSink

    [ https://issues.apache.org/jira/browse/IMPALA-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898319#comment-16898319 ] 

Sahil Takiar commented on IMPALA-8818:
--------------------------------------

Makes sense. I'm considering adding two query options then:
 * {{MAX_PINNED_RESULT_SPOOLING_MEMORY}} - limits the max amount of pinned memory used for result spooling by setting a max reservation for the {{PlanRootSink}}
 ** In terms of the actual code, this will be used to set {{TBackendResourceProfile.max_reservation}}
 ** A value of 0 means the memory is unbounded, so no max reservation is set (which means {{Long.MAX_VALUE}} is used for the max reservation value), but as you said, the query-wide limit still applies
 ** Considering a default of 100 MB
 * {{MAX_UNPINNED_RESULT_SPOOLING_MEMORY}} - limits the max amount of unpinned memory used for result spooling
 ** I think this requires some changes to {{BufferedTupleStream}} to track how much of its memory is unpinned (e.g. add an unpinned version of {{BufferedTupleStream::BytesPinned}})
 ** Based on my understanding of {{BufferedTupleStream}}, a call to {{UnpinStream}} unpins all the pages in the stream; this means that {{MAX_UNPINNED_RESULT_SPOOLING_MEMORY}} must be >= {{MAX_PINNED_RESULT_SPOOLING_MEMORY}} so that when {{UnpinStream}} is called, we don't exceed the value of {{MAX_UNPINNED_RESULT_SPOOLING_MEMORY}}
 ** I don't see a straightforward way to make this a hard limit because unpinned pages are not reserved (maybe I'm missing something), but I think for now it is sufficient to make this a soft limit (e.g. adding a {{RowBatch}} to the stream may push the amount of unpinned memory over the limit, but attempts to add additional batches will block)
 ** Considering a default of 1 GB

A few things I'm still trying to understand in BTS:
 * When a stream is unpinned, are new pages pinned or unpinned?
 * When do unpinned pages get spilled to disk / what decides if unpinned pages are spilled?

> Replace deque queue with spillable queue in BufferedPlanRootSink
> ----------------------------------------------------------------
>
>                 Key: IMPALA-8818
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8818
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>
> Add a {{SpillableRowBatchQueue}} to replace the {{DequeRowBatchQueue}} in {{BufferedPlanRootSink}}. The {{SpillableRowBatchQueue}} will wrap a {{BufferedTupleStream}} and take in a {{TBackendResourceProfile}} created by {{PlanRootSink#computeResourceProfile}}.
> *BufferedTupleStream Usage*:
> The wrapped {{BufferedTupleStream}} should be created in 'attach_on_read' mode so that pages are attached to the output {{RowBatch}} in {{BufferedTupleStream::GetNext}}. The BTS should start off as pinned (e.g. all pages are pinned). If a call to {{BufferedTupleStream::AddRow}} returns false (it returns false if "the unused reservation was not sufficient to add a new page to the stream large enough to fit 'row' and the stream could not increase the reservation to get enough unused reservation"), it should unpin the stream ({{BufferedTupleStream::UnpinStream}}) and then add the row (if the row still could not be added, then an error must have occurred, perhaps an IO error, in which case return the error and fail the query).
> *Constraining Resources*:
> When result spooling is disabled, a user can run a {{select * from [massive-fact-table]}} and scroll through the results without affecting the health of the Impala cluster (assuming they close they query promptly). Impala will stream the results one batch at a time to the user.
> With result spooling, a naive implementation might try and buffer the enter fact table, and end up spilling all the contents to disk, which can potentially take up a large amount of space. So there needs to be restrictions on the memory and disk space used by the {{BufferedTupleStream}} in order to ensure a scan of a massive table does not consume all the memory or disk space of the Impala coordinator.
> This problem can be solved by placing a max size on the amount of unpinned memory (perhaps through a new config option {{MAX_PINNED_RESULT_SPOOLING_MEMORY}} (maybe set to a few GBs by default). The max amount of pinned memory should already be constrained by the reservation (see next paragraph). NUM_ROWS_PRODUCED_LIMIT already limits the number of rows returned by a query, and so it should limit the number of rows buffered by the BTS as well (although it is set to 0 by default). SCRATCH_LIMIT already limits the amount of disk space used for spilling (although it is set to -1 by default).
> The {{PlanRootSink}} should attempt to accurately estimate how much memory it needs to buffer all results in memory. This requires setting an accurate value of {{ResourceProfile#memEstimateBytes_}} in {{PlanRootSink#computeResourceProfile}}. If statistics are available, the estimate can be based on the number of estimated rows returned multiplied by the size of the rows returned. The min reservation should account for a read and write page for the {{BufferedTupleStream}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org