You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Philip Zeyliger (Code Review)" <ge...@cloudera.org> on 2018/10/25 03:45:30 UTC

[Impala-ASF-CR] test-with-docker: decrease image size by "de-duping" HDFS.

Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11782


Change subject: test-with-docker: decrease image size by "de-duping" HDFS.
......................................................................

test-with-docker: decrease image size by "de-duping" HDFS.

This change shaves about 20GB of the (uncompressed) Docker
image for test-with-docker, taking it from ~60GB to ~40GB.
Compressed, the image ends up being about 14GB.

To do this, we cheat: HDFS represents every block three times, so we
have three copies of every block. Before committing the image, we simply
hard-link the blocks together, which happens to work. It's an
implementation detail of HDFS that these blocks aren't, say, appended
to, but I think the trade-off in time and disk space saved is worth it.
Because the image is smaller, it takes less time to "docker commit" it.

Change-Id: I4a13910ba5e873c31893dbb810a8410547adb2f1
---
M docker/entrypoint.sh
1 file changed, 18 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/11782/1
-- 
To view, visit http://gerrit.cloudera.org:8080/11782
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4a13910ba5e873c31893dbb810a8410547adb2f1
Gerrit-Change-Number: 11782
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>

[Impala-ASF-CR] test-with-docker: decrease image size by "de-duping" HDFS.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11782 )

Change subject: test-with-docker: decrease image size by "de-duping" HDFS.
......................................................................


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3722/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/11782
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a13910ba5e873c31893dbb810a8410547adb2f1
Gerrit-Change-Number: 11782
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Tue, 05 Feb 2019 21:23:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] test-with-docker: decrease image size by "de-duping" HDFS.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11782 )

Change subject: test-with-docker: decrease image size by "de-duping" HDFS.
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/1153/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11782
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a13910ba5e873c31893dbb810a8410547adb2f1
Gerrit-Change-Number: 11782
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Oct 2018 04:28:52 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] test-with-docker: decrease image size by "de-duping" HDFS.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11782 )

Change subject: test-with-docker: decrease image size by "de-duping" HDFS.
......................................................................


Patch Set 3: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/11782
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a13910ba5e873c31893dbb810a8410547adb2f1
Gerrit-Change-Number: 11782
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Wed, 06 Feb 2019 01:21:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] test-with-docker: decrease image size by "de-duping" HDFS.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11782 )

Change subject: test-with-docker: decrease image size by "de-duping" HDFS.
......................................................................

test-with-docker: decrease image size by "de-duping" HDFS.

This change shaves about 20GB of the (uncompressed) Docker
image for test-with-docker, taking it from ~60GB to ~40GB.
Compressed, the image ends up being about 14GB.

To do this, we cheat: HDFS represents every block three times, so we
have three copies of every block. Before committing the image, we simply
hard-link the blocks together, which happens to work. It's an
implementation detail of HDFS that these blocks aren't, say, appended
to, but I think the trade-off in time and disk space saved is worth it.
Because the image is smaller, it takes less time to "docker commit" it.

Change-Id: I4a13910ba5e873c31893dbb810a8410547adb2f1
Reviewed-on: http://gerrit.cloudera.org:8080/11782
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M docker/entrypoint.sh
1 file changed, 18 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/11782
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I4a13910ba5e873c31893dbb810a8410547adb2f1
Gerrit-Change-Number: 11782
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>

[Impala-ASF-CR] test-with-docker: decrease image size by "de-duping" HDFS.

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/11782 )

Change subject: test-with-docker: decrease image size by "de-duping" HDFS.
......................................................................


Patch Set 1:

I think this change makes sense. Can you rebase?


-- 
To view, visit http://gerrit.cloudera.org:8080/11782
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a13910ba5e873c31893dbb810a8410547adb2f1
Gerrit-Change-Number: 11782
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Tue, 05 Feb 2019 18:06:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] test-with-docker: decrease image size by "de-duping" HDFS.

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/11782 )

Change subject: test-with-docker: decrease image size by "de-duping" HDFS.
......................................................................


Patch Set 1:

(1 comment)

> (1 comment)
 > 
 > You know what somebody once said about shell scripts getting too
 > big?

Yes, I definitely think this is crossing that threshold, but don't want to re-write it quite yet.

http://gerrit.cloudera.org:8080/#/c/11782/1/docker/entrypoint.sh
File docker/entrypoint.sh:

http://gerrit.cloudera.org:8080/#/c/11782/1/docker/entrypoint.sh@228
PS1, Line 228:   set +x
> Do you need to keep this?
Yes, this avoids a lot of verbosity for every copy. It's undone in line 238.



-- 
To view, visit http://gerrit.cloudera.org:8080/11782
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a13910ba5e873c31893dbb810a8410547adb2f1
Gerrit-Change-Number: 11782
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Thu, 31 Jan 2019 22:47:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] test-with-docker: decrease image size by "de-duping" HDFS.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11782 )

Change subject: test-with-docker: decrease image size by "de-duping" HDFS.
......................................................................


Patch Set 3: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/11782
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a13910ba5e873c31893dbb810a8410547adb2f1
Gerrit-Change-Number: 11782
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Tue, 05 Feb 2019 21:23:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] test-with-docker: decrease image size by "de-duping" HDFS.

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/11782 )

Change subject: test-with-docker: decrease image size by "de-duping" HDFS.
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/11782
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a13910ba5e873c31893dbb810a8410547adb2f1
Gerrit-Change-Number: 11782
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Tue, 05 Feb 2019 21:13:29 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] test-with-docker: decrease image size by "de-duping" HDFS.

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/11782 )

Change subject: test-with-docker: decrease image size by "de-duping" HDFS.
......................................................................


Patch Set 2:

> I think this change makes sense. Can you rebase?

done


-- 
To view, visit http://gerrit.cloudera.org:8080/11782
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a13910ba5e873c31893dbb810a8410547adb2f1
Gerrit-Change-Number: 11782
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Tue, 05 Feb 2019 18:16:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] test-with-docker: decrease image size by "de-duping" HDFS.

Posted by "Michal Ostrowski (Code Review)" <ge...@cloudera.org>.
Michal Ostrowski has posted comments on this change. ( http://gerrit.cloudera.org:8080/11782 )

Change subject: test-with-docker: decrease image size by "de-duping" HDFS.
......................................................................


Patch Set 1:

(1 comment)

You know what somebody once said about shell scripts getting too big?

http://gerrit.cloudera.org:8080/#/c/11782/1/docker/entrypoint.sh
File docker/entrypoint.sh:

http://gerrit.cloudera.org:8080/#/c/11782/1/docker/entrypoint.sh@228
PS1, Line 228:   set +x
Do you need to keep this?



-- 
To view, visit http://gerrit.cloudera.org:8080/11782
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a13910ba5e873c31893dbb810a8410547adb2f1
Gerrit-Change-Number: 11782
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mo...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Oct 2018 12:52:22 +0000
Gerrit-HasComments: Yes