You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Dan Hecht (Code Review)" <ge...@cloudera.org> on 2017/02/02 00:23:48 UTC

[Impala-ASF-CR] IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool

Dan Hecht has posted comments on this change.

Change subject: IMPALA-4674: Part 1: port BufferedTupleStream to BufferPool
......................................................................


Patch Set 4:

(32 comments)

First set of comments focusing on the public interface.

http://gerrit.cloudera.org:8080/#/c/5811/4/be/src/runtime/buffered-tuple-stream-v2.h
File be/src/runtime/buffered-tuple-stream-v2.h:

PS4, Line 41:  or scratch disk storage
what does this mean? isn't it always backed by BufferPool (which  may spill to disk internally)?


PS4, Line 45: the APIs are called from a single thread
this is saying per BufferTupleStream object, right? maybe clarify that


Line 46: ///
it'd be nice to explain the point of this abstraction a bit more explicitly. like when we talk about BTS, we talk in terms of iterators and the iteration patterns, but this only alludes to that. and the random access pattern. etc.


PS4, Line 49: pinned or unpinned
I think these terms need to be defined relative to the stream as far as what it means to a client and why they would change between states.


PS4, Line 59: page size
wouldn't it also be a function of tuple size (i.e. tuples per page)?


PS4, Line 77: pages
pages' buffers.


PS4, Line 80: location in memory
or you could say "buffer".


PS4, Line 110: in pages_
in the stream


PS4, Line 111: in pages_
same


PS4, Line 114: (if read_write is
             : ///   true, then two pages need to be in memory).
rather than saying this in parenthesis, i think we should pull it up to say that there can be both a read and write iterator simultaneously.


PS4, Line 117: caller's reservation is insufficient
does it first try to increase the reservation at this point?


Line 124: /// returned so far from the stream may be freed on the next call to GetNext().
do we have a jira we can reference as a TODO here to simplify this?


Line 126: /// Manual construction of rows with AllocateRow():
not for now, but is it possible to unify things so we only have one way to write into the stream?

In the mean time, how does this mode relate to the "write" mode above? is it a sub-mode for pinned?


Line 138: ///     this array as new rows are inserted in the page. If we do so, then there will be
is that still something we want to do?


PS4, Line 146: id
what is 'id'? is this index?


PS4, Line 147: With 8MB pages that is 512GB per stream.
where does this limit come into play?  is this something that will cause trouble with going to 2MB pages?


PS4, Line 186: page_len: the size of pages to use in the stream
what does that mean for large rows?


PS4, Line 198: of
off


PS4, Line 203: AddRow
is it also used before AllocateRow()?


Line 207:   ///     if an error status is returned.
broken line break


Line 212:   /// sets 'status' to an error if an error occurred.
hard to parse that sentence.  but, can we simplify this dance now that we have real reservations?
also, does this operation try to increase reservation before failing to append?
is this guaranteed to succeed (other than runtime errors) for unpinned streams (maybe after you add support for large rows)?


Line 223:   /// tuples.
same questions. let's think of this interface can be simplified now.


Line 228:   /// been allocated with the stream's row desc.
is the row valid only as long as the stream remains pinned?


PS4, Line 232: begin reading
before the first GetNext()?


Line 236:   ///     false if the page could not be pinned and no error was encountered.
can we explain that in terms of reservations? and like questions above, does this got_buffer==false interface still make sense with reservations?


PS4, Line 240: enough memory
what does that mean in terms of reservations?  and does it still make sense with reservations?


Line 253:   };
this seems like a wrinkle to the class comment. can it be succinctly described as part of the iterator discussion?


PS4, Line 259: must be copied out by the caller
this seems to contradict the class comment about "needs_deep_copy"


PS4, Line 267: *got_rows is false if the stream could not be pinne
does that still make sense with reservations?  if so, it should be expressed in terms of reservation.


PS4, Line 271: date
data?


PS4, Line 271: buffers containing page
are there buffers that don't contain page data?


PS4, Line 286: Returns the byte size of the stream that is currently pinned in memory.
Is this trying to say:
Returns the number of bytes currently pinned in memory by the stream?


-- 
To view, visit http://gerrit.cloudera.org:8080/5811
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7bb47a818564ac19a6be4ca88db0578f4ea0b709
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes