You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Adar Dembo (Code Review)" <ge...@cloudera.org> on 2019/12/19 07:11:00 UTC

[kudu-CR] log: use RWFile in log segments

Hello Andrew Wong,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/14931

to review the following change.


Change subject: log: use RWFile in log segments
......................................................................

log: use RWFile in log segments

I'm working towards the eventual goal of caching log segments in the file
cache, but for that to work, the segments' use of file handles must be safe
for caching. Currently they're not:
1. WritableLogSegment uses WritableFile, which has mutable in-memory state.
2. The "active" segment in the log has both a writable and readable log
   segment open at the same time, each of which uses a different file handle
   implementation. The file cache could allow that, but it's complexity that
   I'd rather avoid.
3. The log expects the "active" segment to be truncated and sync'ed when
   closed, but that's not safe in a world where the segment may drop out of
   the cache and get closed arbitrarily.

We can fix all of these issues by converting log segments to use RWFile.
It's not "pure" (especially for ReadableLogSegment which doesn't write), but
it gets the job done.

There are some notable semantic changes:
1. The log must explicitly truncate/sync when finishing the active segment.
2. Because both kinds of segments now use the same file handle abstraction,
   we no longer need to close and reopen the underlying file handle when
   finishing the active segment; we can just share the existing file handle.
3. Likewise, when opening a ReadableLogSegment for the just-switched-to
   active segment, we can reuse the file handle from the WritableLogSegment
   rather than opening a second file handle.

Change-Id: I65b39e219e76876df16e698211eb558ab31329c8
---
M src/kudu/consensus/log-test.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log_util.cc
M src/kudu/consensus/log_util.h
M src/kudu/tserver/tablet_copy_service-test.cc
M src/kudu/tserver/tablet_copy_source_session.cc
M src/kudu/tserver/tablet_copy_source_session.h
8 files changed, 135 insertions(+), 146 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/31/14931/1
-- 
To view, visit http://gerrit.cloudera.org:8080/14931
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I65b39e219e76876df16e698211eb558ab31329c8
Gerrit-Change-Number: 14931
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>

[kudu-CR] log: use RWFile in log segments

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14931 )

Change subject: log: use RWFile in log segments
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14931/1/src/kudu/consensus/log.cc
File src/kudu/consensus/log.cc:

http://gerrit.cloudera.org:8080/#/c/14931/1/src/kudu/consensus/log.cc@549
PS1, Line 549: opts_->preallocate_segments &&
             :       active_segment_->written_offset() < max_segment_size_
Why are these conditions important? Mind commenting?



-- 
To view, visit http://gerrit.cloudera.org:8080/14931
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I65b39e219e76876df16e698211eb558ab31329c8
Gerrit-Change-Number: 14931
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 19 Dec 2019 08:28:47 +0000
Gerrit-HasComments: Yes

[kudu-CR] log: use RWFile in log segments

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14931

to look at the new patch set (#2).

Change subject: log: use RWFile in log segments
......................................................................

log: use RWFile in log segments

I'm working towards the eventual goal of caching log segments in the file
cache, but for that to work, the segments' use of file handles must be safe
for caching. Currently they're not:
1. WritableLogSegment uses WritableFile, which has mutable in-memory state.
2. The "active" segment in the log has both a writable and readable log
   segment open at the same time, each of which uses a different file handle
   implementation. The file cache could allow that, but it's complexity that
   I'd rather avoid.
3. The log expects the "active" segment to be truncated and sync'ed when
   closed, but that's not safe in a world where the segment may drop out of
   the cache and get closed arbitrarily.

We can fix all of these issues by converting log segments to use RWFile.
It's not "pure" (especially for ReadableLogSegment which doesn't write), but
it gets the job done.

There are some notable semantic changes:
1. The log must explicitly truncate/sync when finishing the active segment.
2. Because both kinds of segments now use the same file handle abstraction,
   we no longer need to close and reopen the underlying file handle when
   finishing the active segment; we can just share the existing file handle.
3. Likewise, when opening a ReadableLogSegment for the just-switched-to
   active segment, we can reuse the file handle from the WritableLogSegment
   rather than opening a second file handle.

Change-Id: I65b39e219e76876df16e698211eb558ab31329c8
---
M src/kudu/consensus/log-test.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log_util.cc
M src/kudu/consensus/log_util.h
M src/kudu/tserver/tablet_copy_service-test.cc
M src/kudu/tserver/tablet_copy_source_session.cc
M src/kudu/tserver/tablet_copy_source_session.h
8 files changed, 142 insertions(+), 151 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/31/14931/2
-- 
To view, visit http://gerrit.cloudera.org:8080/14931
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I65b39e219e76876df16e698211eb558ab31329c8
Gerrit-Change-Number: 14931
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] log: use RWFile in log segments

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14931 )

Change subject: log: use RWFile in log segments
......................................................................

log: use RWFile in log segments

I'm working towards the eventual goal of caching log segments in the file
cache, but for that to work, the segments' use of file handles must be safe
for caching. Currently they're not:
1. WritableLogSegment uses WritableFile, which has mutable in-memory state.
2. The "active" segment in the log has both a writable and readable log
   segment open at the same time, each of which uses a different file handle
   implementation. The file cache could allow that, but it's complexity that
   I'd rather avoid.
3. The log expects the "active" segment to be truncated and sync'ed when
   closed, but that's not safe in a world where the segment may drop out of
   the cache and get closed arbitrarily.

We can fix all of these issues by converting log segments to use RWFile.
It's not "pure" (especially for ReadableLogSegment which doesn't write), but
it gets the job done.

There are some notable semantic changes:
1. The log must explicitly truncate/sync when finishing the active segment.
2. Because both kinds of segments now use the same file handle abstraction,
   we no longer need to close and reopen the underlying file handle when
   finishing the active segment; we can just share the existing file handle.
3. Likewise, when opening a ReadableLogSegment for the just-switched-to
   active segment, we can reuse the file handle from the WritableLogSegment
   rather than opening a second file handle.

Change-Id: I65b39e219e76876df16e698211eb558ab31329c8
Reviewed-on: http://gerrit.cloudera.org:8080/14931
Reviewed-by: Andrew Wong <aw...@cloudera.com>
Tested-by: Kudu Jenkins
---
M src/kudu/consensus/log-test.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log_util.cc
M src/kudu/consensus/log_util.h
M src/kudu/tserver/tablet_copy_service-test.cc
M src/kudu/tserver/tablet_copy_source_session.cc
M src/kudu/tserver/tablet_copy_source_session.h
8 files changed, 147 insertions(+), 151 deletions(-)

Approvals:
  Andrew Wong: Looks good to me, approved
  Kudu Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/14931
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I65b39e219e76876df16e698211eb558ab31329c8
Gerrit-Change-Number: 14931
Gerrit-PatchSet: 4
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] log: use RWFile in log segments

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14931 )

Change subject: log: use RWFile in log segments
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14931/2/src/kudu/consensus/log.cc
File src/kudu/consensus/log.cc:

http://gerrit.cloudera.org:8080/#/c/14931/2/src/kudu/consensus/log.cc@549
PS2, Line 549: // Normally, a segment is only finished when it becomes full. However, in some
             :   // cases (e.g. Log::Close) we may finish a segment that isn't full. In such
             :   // cases, let's return the excess preallocated space back to the filesystem by
             :   // truncating off the unused end of the segment.
> nit: this explains that we're truncating, but not why we're truncating cond
Well, it hints at how preallocation is related (i.e. to return excess preallocated space): if there's no preallocation, there's no preallocated space to speak of.

As for segment size, isn't it self-evident that if we've exceeded max_segment_size_ (which is the amount we've preallocated), there's no excess preallocation space to speak of, and thus nothing to truncate?

I'll try to work that into the comment I guess.



-- 
To view, visit http://gerrit.cloudera.org:8080/14931
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I65b39e219e76876df16e698211eb558ab31329c8
Gerrit-Change-Number: 14931
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 19 Dec 2019 22:15:34 +0000
Gerrit-HasComments: Yes

[kudu-CR] log: use RWFile in log segments

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14931 )

Change subject: log: use RWFile in log segments
......................................................................


Patch Set 2: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14931/2/src/kudu/consensus/log.cc
File src/kudu/consensus/log.cc:

http://gerrit.cloudera.org:8080/#/c/14931/2/src/kudu/consensus/log.cc@549
PS2, Line 549: // Normally, a segment is only finished when it becomes full. However, in some
             :   // cases (e.g. Log::Close) we may finish a segment that isn't full. In such
             :   // cases, let's return the excess preallocated space back to the filesystem by
             :   // truncating off the unused end of the segment.
nit: this explains that we're truncating, but not why we're truncating conditionally. I.e. why condition on preallocation and segment size at all?



-- 
To view, visit http://gerrit.cloudera.org:8080/14931
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I65b39e219e76876df16e698211eb558ab31329c8
Gerrit-Change-Number: 14931
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 19 Dec 2019 18:19:43 +0000
Gerrit-HasComments: Yes

[kudu-CR] log: use RWFile in log segments

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14931 )

Change subject: log: use RWFile in log segments
......................................................................


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14931/2/src/kudu/consensus/log.cc
File src/kudu/consensus/log.cc:

http://gerrit.cloudera.org:8080/#/c/14931/2/src/kudu/consensus/log.cc@549
PS2, Line 549: // max_segment_size_ defines the (soft) limit of a segment. When preallocation
             :   // is enabled, max_segment_size also defines the amount of space that is
             :   // preallocated at segment creation time.
             :   //
> Well, it hints at how preallocation is related (i.e. to return excess preal
Thanks.

Yeah, I just wanted to make sure there relationship between preallocation and truncation was clearer, and why truncation doesn't make sense if we're not preallocating.



-- 
To view, visit http://gerrit.cloudera.org:8080/14931
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I65b39e219e76876df16e698211eb558ab31329c8
Gerrit-Change-Number: 14931
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 19 Dec 2019 22:35:10 +0000
Gerrit-HasComments: Yes

[kudu-CR] log: use RWFile in log segments

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14931

to look at the new patch set (#3).

Change subject: log: use RWFile in log segments
......................................................................

log: use RWFile in log segments

I'm working towards the eventual goal of caching log segments in the file
cache, but for that to work, the segments' use of file handles must be safe
for caching. Currently they're not:
1. WritableLogSegment uses WritableFile, which has mutable in-memory state.
2. The "active" segment in the log has both a writable and readable log
   segment open at the same time, each of which uses a different file handle
   implementation. The file cache could allow that, but it's complexity that
   I'd rather avoid.
3. The log expects the "active" segment to be truncated and sync'ed when
   closed, but that's not safe in a world where the segment may drop out of
   the cache and get closed arbitrarily.

We can fix all of these issues by converting log segments to use RWFile.
It's not "pure" (especially for ReadableLogSegment which doesn't write), but
it gets the job done.

There are some notable semantic changes:
1. The log must explicitly truncate/sync when finishing the active segment.
2. Because both kinds of segments now use the same file handle abstraction,
   we no longer need to close and reopen the underlying file handle when
   finishing the active segment; we can just share the existing file handle.
3. Likewise, when opening a ReadableLogSegment for the just-switched-to
   active segment, we can reuse the file handle from the WritableLogSegment
   rather than opening a second file handle.

Change-Id: I65b39e219e76876df16e698211eb558ab31329c8
---
M src/kudu/consensus/log-test.cc
M src/kudu/consensus/log.cc
M src/kudu/consensus/log.h
M src/kudu/consensus/log_util.cc
M src/kudu/consensus/log_util.h
M src/kudu/tserver/tablet_copy_service-test.cc
M src/kudu/tserver/tablet_copy_source_session.cc
M src/kudu/tserver/tablet_copy_source_session.h
8 files changed, 147 insertions(+), 151 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/31/14931/3
-- 
To view, visit http://gerrit.cloudera.org:8080/14931
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I65b39e219e76876df16e698211eb558ab31329c8
Gerrit-Change-Number: 14931
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] log: use RWFile in log segments

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14931 )

Change subject: log: use RWFile in log segments
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14931/1/src/kudu/consensus/log.cc
File src/kudu/consensus/log.cc:

http://gerrit.cloudera.org:8080/#/c/14931/1/src/kudu/consensus/log.cc@549
PS1, Line 549: opts_->preallocate_segments &&
             :       active_segment_->written_offset() < max_segment_size_
> Why are these conditions important? Mind commenting?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14931
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I65b39e219e76876df16e698211eb558ab31329c8
Gerrit-Change-Number: 14931
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 19 Dec 2019 09:15:04 +0000
Gerrit-HasComments: Yes