You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Adar Dembo (Code Review)" <ge...@cloudera.org> on 2017/04/07 03:17:05 UTC

[kudu-CR] env: add RWFile::GetExtentMap for analyzing file extents

Hello David Ribeiro Alves, Todd Lipcon,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/6583

to review the following change.

Change subject: env: add RWFile::GetExtentMap for analyzing file extents
......................................................................

env: add RWFile::GetExtentMap for analyzing file extents

This patch introduces a method to get the extent metadata of a file provided
it resides on an extent-based filesystem (such as ext4 or xfs). Each extent
is an offset and length into the file, and represents a chunk of filesystem
that has been allocated for the file. Gaps between extents are expected to
be unallocated and may represent punched out holes.

On Linux, the extent listing is retrieved via repeated calls to the
FS_IOC_FIEMAP ioctl, though only some of the information returned is
actually used.

I intend to use this in the log block manager for two purposes:
- Discovering whether a container has allocated space beyond its file size.
- Finding all of a container's "unpunched" holes, and repunching them.

Change-Id: I35bd1bdb9e1a839af2ab95ea73b79217c1f4a2b3
---
M src/kudu/util/env-test.cc
M src/kudu/util/env.h
M src/kudu/util/env_posix.cc
M src/kudu/util/file_cache.cc
4 files changed, 170 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/6583/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6583
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I35bd1bdb9e1a839af2ab95ea73b79217c1f4a2b3
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] env: add RWFile::GetExtentMap for analyzing file extents

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: env: add RWFile::GetExtentMap for analyzing file extents
......................................................................


Patch Set 7: Verified+1

Filed KUDU-1976 for the unrelated failure.

-- 
To view, visit http://gerrit.cloudera.org:8080/6583
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I35bd1bdb9e1a839af2ab95ea73b79217c1f4a2b3
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] env: add RWFile::GetExtentMap for analyzing file extents

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: env: add RWFile::GetExtentMap for analyzing file extents
......................................................................


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6583/1/src/kudu/util/env-test.cc
File src/kudu/util/env-test.cc:

PS1, Line 875:   char buf[1024];
             :   memset(buf, 0, sizeof(buf));
I think it's possible to write 

char buf[1024] = { 0 };

instead


http://gerrit.cloudera.org:8080/#/c/6583/1/src/kudu/util/env_posix.cc
File src/kudu/util/env_posix.cc:

PS1, Line 728: OVERRIDE
nit: consider replacing with C++11-ish

Status GetExtentMap(ExtentMap* out) const override


PS1, Line 737: unique_ptr<char[]> buf(new char[kBufSize]);
Is it crucial to allocate this on the heap?  Would stack allocated array suffice?  It could allow to get rid of memset() call as well:

char buf[kBufSize] = { 0 };

Or even

uint8_t buf[kBufSize] = { 0 };


PS1, Line 747: ~(0ULL)
Would using FIEMAP_MAX_OFFSET macro be more idiomatic?


PS1, Line 749:       int ret = ioctl(fd_, FS_IOC_FIEMAP, fm);
             :       if (ret == -1) {
nit: since the 'ret' variable is not referenced anywhere else, consider

if (ioctl(...) == -1) {
  ...;
}


-- 
To view, visit http://gerrit.cloudera.org:8080/6583
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I35bd1bdb9e1a839af2ab95ea73b79217c1f4a2b3
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] env: add RWFile::GetExtentMap for analyzing file extents

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: env: add RWFile::GetExtentMap for analyzing file extents
......................................................................


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6583/1/src/kudu/util/env-test.cc
File src/kudu/util/env-test.cc:

PS1, Line 875:   char buf[1024];
             :   memset(buf, 0, sizeof(buf));
> I think it's possible to write 
No longer an issue; I'm now going to preallocate the file instead of writing zeroes to it.


http://gerrit.cloudera.org:8080/#/c/6583/1/src/kudu/util/env_posix.cc
File src/kudu/util/env_posix.cc:

PS1, Line 728: OVERRIDE
> nit: consider replacing with C++11-ish
Just trying to be consistent with the rest of the file.


PS1, Line 737: unique_ptr<char[]> buf(new char[kBufSize]);
> Is it crucial to allocate this on the heap?  Would stack allocated array su
I was worried about stack allocating a large buffer (say, a megabyte). But I suppose 4K isn't too bad.


PS1, Line 747: ~(0ULL)
> Would using FIEMAP_MAX_OFFSET macro be more idiomatic?
Ah neat, didn't know about that.


PS1, Line 749:       int ret = ioctl(fd_, FS_IOC_FIEMAP, fm);
             :       if (ret == -1) {
> nit: since the 'ret' variable is not referenced anywhere else, consider
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6583
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I35bd1bdb9e1a839af2ab95ea73b79217c1f4a2b3
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] env: add RWFile::GetExtentMap for analyzing file extents

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: env: add RWFile::GetExtentMap for analyzing file extents
......................................................................


Patch Set 7: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6583
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I35bd1bdb9e1a839af2ab95ea73b79217c1f4a2b3
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] env: add RWFile::GetExtentMap for analyzing file extents

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: env: add RWFile::GetExtentMap for analyzing file extents
......................................................................


Patch Set 1:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/6583/1/src/kudu/util/env-test.cc
File src/kudu/util/env-test.cc:

PS1, Line 896: LOG(INFO)
> how big can this get? maybe VLOG?
I only get one extent on my laptop. I don't expect it to be big, and I also don't mind logging in tests if it helps debugging.


PS1, Line 908:   uint64_t found_offset = 0;
             :   for (const auto& e : extents) {
             :     if (e.second >= (fs_block_size * 3)) {
             :       found_offset = e.first + fs_block_size;
             :       break;
             :     }
             :   }
             :   if (found_offset == 0) {
             :     LOG(INFO) << "Couldn't find extent to split, skipping this part of the test";
             :     return;
             :   }
> how likely is this to happen? are we sure we won't be skipping this test mo
It's virtually impossible. One of the following would have to be true (assuming a 4096 fs block size):
1. In response to our 1k-at-a-time writes, the fs stupidly allocated one extent (sized to one block) every 4th write. Definitely won't happen with an ext4 that supports delayed allocation.
2. The filesystem was so fragmented that it couldn't even do a single 12k allocation.

But, I didn't want to assume anything about the test environment, so the code is forgiving.

On second thought, I'll eliminate case #1 by growing the file by preallocation instead of 1k-at-a-time writes.


PS1, Line 934: LOG(INFO)
> maybe VLOG?
Like I wrote above, I don't mind logging tests, and it can be helpful.


http://gerrit.cloudera.org:8080/#/c/6583/1/src/kudu/util/env.h
File src/kudu/util/env.h:

PS1, Line 587:   // If the underlying filesystem does not support extents, map entries
             :   // represent runs of adjacent fixed-size filesystem blocks instead.
> well in the apple case this does nothing, right? maybe elaborate a bit more
Sorry, this is actually intended to cover ext3.

I'll add another note for other platforms.


PS1, Line 589: typedef std::map<uint64_t, uint64_t> ExtentMap;
> I like this way of typedeffing method arguments, maybe we should add this t
I'm not actually sure that it's the best approach for C++11. I think we should be doing:

  using ExtentMap = std::map<uint64_t, uint64_t>;

But I didn't want to rock the boat too much.


http://gerrit.cloudera.org:8080/#/c/6583/1/src/kudu/util/env_posix.cc
File src/kudu/util/env_posix.cc:

PS1, Line 732: TRACE_EVENT1("io", "PosixRWFile::GetExtentMap", "path", filename_);
> why is this traceable? do we get extent maps online?
It's traceable because I believe the convention for all env_posix.cc functions is to trace if it involves any kind of IO (i.e. if it can block).


Line 740:     struct fiemap* fm = (struct fiemap*)buf.get();
> warning: C-style casts are discouraged; use reinterpret_cast [google-readab
Done


PS1, Line 764: LOG(WARNING) << Substitute(
             :               "Unexpected extent found at offset $0", fme[i].fe_logical);
> log fatal/log error? is the data returned by this method still useful if th
I wasn't sure whether to trust that the kernel isn't buggy and crash if something unexpected happened, or whether to be more forgiving. But you're right that it's tough to reason about what to do with the data if the kernel gets it wrong.

I'll change this to be more strict.


PS1, Line 767: bool inserted = InsertIfNotPresent(&extents,
             :                                            fme[i].fe_logical,
             :                                            fme[i].fe_length);
> we can get repeated extents? if not InsertOrDie?
Done


PS1, Line 771: // Two extents at the same logical offset? Shouldn't be possible.
             :           LOG(WARNING) << Substitute(
             :               "Multiple extents found at offset $0. Ignoring",
             :               fme[i].fe_logical);
> same comment as the case above
Done


PS1, Line 776: \n
> why the \n?
My mistake.


-- 
To view, visit http://gerrit.cloudera.org:8080/6583
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I35bd1bdb9e1a839af2ab95ea73b79217c1f4a2b3
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] env: add RWFile::GetExtentMap for analyzing file extents

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: env: add RWFile::GetExtentMap for analyzing file extents
......................................................................


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6583
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I35bd1bdb9e1a839af2ab95ea73b79217c1f4a2b3
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] env: add RWFile::GetExtentMap for analyzing file extents

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has submitted this change and it was merged.

Change subject: env: add RWFile::GetExtentMap for analyzing file extents
......................................................................


env: add RWFile::GetExtentMap for analyzing file extents

This patch introduces a method to get the extent metadata of a file provided
it resides on an extent-based filesystem (such as ext4 or xfs). Each extent
is an offset and length into the file, and represents a chunk of filesystem
that has been allocated for the file. Gaps between extents are expected to
be unallocated and may represent punched out holes.

On Linux, the extent listing is retrieved via repeated calls to the
FS_IOC_FIEMAP ioctl, though only some of the information returned is
actually used.

Originally I intended to use this in the log block manager for finding
extra allocated space in container files. I ended up using a coarser
heuristic, but I'd like to merge this anyway as I think it'll be useful in
future as a more precise way of repairing extra allocated space.

Change-Id: I35bd1bdb9e1a839af2ab95ea73b79217c1f4a2b3
Reviewed-on: http://gerrit.cloudera.org:8080/6583
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves <da...@gmail.com>
---
M src/kudu/util/env-test.cc
M src/kudu/util/env.h
M src/kudu/util/env_posix.cc
M src/kudu/util/file_cache.cc
4 files changed, 145 insertions(+), 3 deletions(-)

Approvals:
  David Ribeiro Alves: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/6583
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I35bd1bdb9e1a839af2ab95ea73b79217c1f4a2b3
Gerrit-PatchSet: 12
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] env: add RWFile::GetExtentMap for analyzing file extents

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: env: add RWFile::GetExtentMap for analyzing file extents
......................................................................


Patch Set 5: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6583
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I35bd1bdb9e1a839af2ab95ea73b79217c1f4a2b3
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] env: add RWFile::GetExtentMap for analyzing file extents

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: env: add RWFile::GetExtentMap for analyzing file extents
......................................................................


Patch Set 4: Verified+1

Unrelated failure in security-itests.

-- 
To view, visit http://gerrit.cloudera.org:8080/6583
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I35bd1bdb9e1a839af2ab95ea73b79217c1f4a2b3
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] env: add RWFile::GetExtentMap for analyzing file extents

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: env: add RWFile::GetExtentMap for analyzing file extents
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6583/1/src/kudu/util/env-test.cc
File src/kudu/util/env-test.cc:

PS1, Line 908:   uint64_t found_offset = 0;
             :   for (const auto& e : extents) {
             :     if (e.second >= (fs_block_size * 3)) {
             :       found_offset = e.first + fs_block_size;
             :       break;
             :     }
             :   }
             :   if (found_offset == 0) {
             :     LOG(INFO) << "Couldn't find extent to split, skipping this part of the test";
             :     return;
             :   }
> If it's virtually impossible can we just fail and adjust later if we find a
Okay.


http://gerrit.cloudera.org:8080/#/c/6583/3/src/kudu/util/env-test.cc
File src/kudu/util/env-test.cc:

Line 888:     LOG(INFO) << "GetExtentMap() not supported, skipping test";
> nit, end sentence with colon
Removed, decided to use SCOPED_TRACE() instead.


http://gerrit.cloudera.org:8080/#/c/6583/1/src/kudu/util/env_posix.cc
File src/kudu/util/env_posix.cc:

PS1, Line 732: TRACE_EVENT1("io", "PosixRWFile::GetExtentMap", "path", filename_);
> right, but for things that are done "online" no? isn't this supposed to alw
No, it's not just for offline tools; in the next patch you'll see it used within the LBM at startup.

Besides, I don't think we should try and draw distinctions between offline and online in util code. Usage evolves and it would be a shame for an IO heavy code path to be missing from a trace because we thought it'd be offline only in the past and forgot to update it when that changed.


-- 
To view, visit http://gerrit.cloudera.org:8080/6583
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I35bd1bdb9e1a839af2ab95ea73b79217c1f4a2b3
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] env: add RWFile::GetExtentMap for analyzing file extents

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello David Ribeiro Alves, Todd Lipcon, Alexey Serbin,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6583

to look at the new patch set (#8).

Change subject: env: add RWFile::GetExtentMap for analyzing file extents
......................................................................

env: add RWFile::GetExtentMap for analyzing file extents

This patch introduces a method to get the extent metadata of a file provided
it resides on an extent-based filesystem (such as ext4 or xfs). Each extent
is an offset and length into the file, and represents a chunk of filesystem
that has been allocated for the file. Gaps between extents are expected to
be unallocated and may represent punched out holes.

On Linux, the extent listing is retrieved via repeated calls to the
FS_IOC_FIEMAP ioctl, though only some of the information returned is
actually used.

Originally I intended to use this in the log block manager for finding
extra allocated space in container files. I ended up using a coarser
heuristic, but I'd like to merge this anyway as I think it'll be useful in
future as a more precise way of repairing extra allocated space.

Change-Id: I35bd1bdb9e1a839af2ab95ea73b79217c1f4a2b3
---
M src/kudu/util/env-test.cc
M src/kudu/util/env.h
M src/kudu/util/env_posix.cc
M src/kudu/util/file_cache.cc
4 files changed, 145 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/6583/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6583
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I35bd1bdb9e1a839af2ab95ea73b79217c1f4a2b3
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] env: add RWFile::GetExtentMap for analyzing file extents

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: env: add RWFile::GetExtentMap for analyzing file extents
......................................................................


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6583/1/src/kudu/util/env-test.cc
File src/kudu/util/env-test.cc:

PS1, Line 908:     }
             :   }
             :   if (found_offset == 0) {
             :     LOG(INFO) << "Couldn't find extent to split, skipping this part of the test";
             :     return;
             :   }
             : 
             :   // Punch out a hole and split the extent.
             :   s = f->PunchHole(found_offset, fs_block_size);
             :   if (s.IsNotSupported()) {
             :    
> It's virtually impossible. One of the following would have to be true (assu
If it's virtually impossible can we just fail and adjust later if we find a case where it's not? Just want to make sure that we're forced to deal with it in case our assumption is wrong.


http://gerrit.cloudera.org:8080/#/c/6583/3/src/kudu/util/env-test.cc
File src/kudu/util/env-test.cc:

Line 888:   LOG(INFO) << "Extent layout before hole punch";
nit, end sentence with colon


http://gerrit.cloudera.org:8080/#/c/6583/1/src/kudu/util/env.h
File src/kudu/util/env.h:

PS1, Line 589: // platform doesn't support fetching extents at
> I'm not actually sure that it's the best approach for C++11. I think we sho
yeah, typedefs are quite traditional by now. fwiw I meat the location of the typedeffing


http://gerrit.cloudera.org:8080/#/c/6583/1/src/kudu/util/env_posix.cc
File src/kudu/util/env_posix.cc:

PS1, Line 732: TRACE_EVENT1("io", "PosixRWFile::GetExtentMap", "path", filename_);
> It's traceable because I believe the convention for all env_posix.cc functi
right, but for things that are done "online" no? isn't this supposed to always run offline? when would you observe the trace?


-- 
To view, visit http://gerrit.cloudera.org:8080/6583
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I35bd1bdb9e1a839af2ab95ea73b79217c1f4a2b3
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] env: add RWFile::GetExtentMap for analyzing file extents

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: env: add RWFile::GetExtentMap for analyzing file extents
......................................................................


Patch Set 6: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6583
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I35bd1bdb9e1a839af2ab95ea73b79217c1f4a2b3
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] env: add RWFile::GetExtentMap for analyzing file extents

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6583

to look at the new patch set (#4).

Change subject: env: add RWFile::GetExtentMap for analyzing file extents
......................................................................

env: add RWFile::GetExtentMap for analyzing file extents

This patch introduces a method to get the extent metadata of a file provided
it resides on an extent-based filesystem (such as ext4 or xfs). Each extent
is an offset and length into the file, and represents a chunk of filesystem
that has been allocated for the file. Gaps between extents are expected to
be unallocated and may represent punched out holes.

On Linux, the extent listing is retrieved via repeated calls to the
FS_IOC_FIEMAP ioctl, though only some of the information returned is
actually used.

I intend to use this in the log block manager for two purposes:
- Discovering whether a container has allocated space beyond its file size.
- Finding all of a container's "unpunched" holes, and repunching them.

Change-Id: I35bd1bdb9e1a839af2ab95ea73b79217c1f4a2b3
---
M src/kudu/util/env-test.cc
M src/kudu/util/env.h
M src/kudu/util/env_posix.cc
M src/kudu/util/file_cache.cc
4 files changed, 145 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/6583/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6583
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I35bd1bdb9e1a839af2ab95ea73b79217c1f4a2b3
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] env: add RWFile::GetExtentMap for analyzing file extents

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: env: add RWFile::GetExtentMap for analyzing file extents
......................................................................


Patch Set 7: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6583
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I35bd1bdb9e1a839af2ab95ea73b79217c1f4a2b3
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] env: add RWFile::GetExtentMap for analyzing file extents

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: env: add RWFile::GetExtentMap for analyzing file extents
......................................................................


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6583
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I35bd1bdb9e1a839af2ab95ea73b79217c1f4a2b3
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] env: add RWFile::GetExtentMap for analyzing file extents

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: env: add RWFile::GetExtentMap for analyzing file extents
......................................................................


Patch Set 6: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6583
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I35bd1bdb9e1a839af2ab95ea73b79217c1f4a2b3
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] env: add RWFile::GetExtentMap for analyzing file extents

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: env: add RWFile::GetExtentMap for analyzing file extents
......................................................................


Patch Set 1:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/6583/1/src/kudu/util/env-test.cc
File src/kudu/util/env-test.cc:

PS1, Line 896: LOG(INFO)
how big can this get? maybe VLOG?


PS1, Line 908:   uint64_t found_offset = 0;
             :   for (const auto& e : extents) {
             :     if (e.second >= (fs_block_size * 3)) {
             :       found_offset = e.first + fs_block_size;
             :       break;
             :     }
             :   }
             :   if (found_offset == 0) {
             :     LOG(INFO) << "Couldn't find extent to split, skipping this part of the test";
             :     return;
             :   }
how likely is this to happen? are we sure we won't be skipping this test most of the time?


PS1, Line 934: LOG(INFO)
maybe VLOG?


http://gerrit.cloudera.org:8080/#/c/6583/1/src/kudu/util/env.h
File src/kudu/util/env.h:

PS1, Line 587:   // If the underlying filesystem does not support extents, map entries
             :   // represent runs of adjacent fixed-size filesystem blocks instead.
well in the apple case this does nothing, right? maybe elaborate a bit more?


PS1, Line 589: typedef std::map<uint64_t, uint64_t> ExtentMap;
I like this way of typedeffing method arguments, maybe we should add this to the guidelines or something


http://gerrit.cloudera.org:8080/#/c/6583/1/src/kudu/util/env_posix.cc
File src/kudu/util/env_posix.cc:

PS1, Line 732: TRACE_EVENT1("io", "PosixRWFile::GetExtentMap", "path", filename_);
why is this traceable? do we get extent maps online?


PS1, Line 764: LOG(WARNING) << Substitute(
             :               "Unexpected extent found at offset $0", fme[i].fe_logical);
log fatal/log error? is the data returned by this method still useful if this happens?


PS1, Line 767: bool inserted = InsertIfNotPresent(&extents,
             :                                            fme[i].fe_logical,
             :                                            fme[i].fe_length);
we can get repeated extents? if not InsertOrDie?


PS1, Line 771: // Two extents at the same logical offset? Shouldn't be possible.
             :           LOG(WARNING) << Substitute(
             :               "Multiple extents found at offset $0. Ignoring",
             :               fme[i].fe_logical);
same comment as the case above


PS1, Line 776: \n
why the \n?


-- 
To view, visit http://gerrit.cloudera.org:8080/6583
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I35bd1bdb9e1a839af2ab95ea73b79217c1f4a2b3
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] env: add RWFile::GetExtentMap for analyzing file extents

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6583

to look at the new patch set (#2).

Change subject: env: add RWFile::GetExtentMap for analyzing file extents
......................................................................

env: add RWFile::GetExtentMap for analyzing file extents

This patch introduces a method to get the extent metadata of a file provided
it resides on an extent-based filesystem (such as ext4 or xfs). Each extent
is an offset and length into the file, and represents a chunk of filesystem
that has been allocated for the file. Gaps between extents are expected to
be unallocated and may represent punched out holes.

On Linux, the extent listing is retrieved via repeated calls to the
FS_IOC_FIEMAP ioctl, though only some of the information returned is
actually used.

I intend to use this in the log block manager for two purposes:
- Discovering whether a container has allocated space beyond its file size.
- Finding all of a container's "unpunched" holes, and repunching them.

Change-Id: I35bd1bdb9e1a839af2ab95ea73b79217c1f4a2b3
---
M src/kudu/util/env-test.cc
M src/kudu/util/env.h
M src/kudu/util/env_posix.cc
M src/kudu/util/file_cache.cc
4 files changed, 157 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/6583/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6583
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I35bd1bdb9e1a839af2ab95ea73b79217c1f4a2b3
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] env: add RWFile::GetExtentMap for analyzing file extents

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: env: add RWFile::GetExtentMap for analyzing file extents
......................................................................


Patch Set 11: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6583
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I35bd1bdb9e1a839af2ab95ea73b79217c1f4a2b3
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No