You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Michael Ho (Code Review)" <ge...@cloudera.org> on 2016/06/08 01:11:45 UTC

[Impala-CR](cdh5-trunk) IMPALA-3223: Supports download of CDH components from S3.

Michael Ho has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/3333

Change subject: IMPALA-3223: Supports download of CDH components from S3.
......................................................................

IMPALA-3223: Supports download of CDH components from S3.

This change updates the toolchain bootstrapping script
to download the CDH components (hadoop, hbase, hive, llama,
llama-minikdc and sentry) from the toolchain S3 bucket to
the toolchain directory if the environment variable
$DOWNLOAD_CDH_COMPONENTS is true. By default, it is false
which means the CDH components in the thirdparty directory
will be used instead.

To build the ASF tree, set $DOWNLOAD_CDH_COMPONENTS to true.
Currently, the CDH components in S3 are snapshots from the
thirdparty directory at 688d0efcd38731e8e27a8236dbdca21c8fd571a1.
Once the integration jenkins job is modified to periodically
update the S3 buckets, we can remove the thirdparty directory
and always use the CDH components in the toolchain directory.

Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
---
M CMakeLists.txt
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M buildall.sh
4 files changed, 120 insertions(+), 53 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/33/3333/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kw...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3223: Supports download of CDH components from S3.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3223: Supports download of CDH components from S3.
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/3333/2/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

Line 288: def download_cdh_components(toolchain_root, cdh_components):
> Updated the script to not download the package at all if the package direct
Thanks, I like the idea of the md5sum as an option, but this is easier to use as a default.


http://gerrit.cloudera.org:8080/#/c/3333/4/buildall.sh
File buildall.sh:

Line 252:   $IMPALA_HOME/bin/bootstrap_toolchain.py
Don't need to comment on this specifically - we have set -e as part of set -euo pipefail enabled in all of our scripts.


-- 
To view, visit http://gerrit.cloudera.org:8080/3333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3223: Supports download of CDH components from S3.

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has uploaded a new patch set (#4).

Change subject: IMPALA-3223: Supports download of CDH components from S3.
......................................................................

IMPALA-3223: Supports download of CDH components from S3.

This change updates the toolchain bootstrapping script
to download the CDH components (hadoop, hbase, hive, llama,
llama-minikdc and sentry) from the toolchain S3 bucket to
the toolchain directory if the environment variable
$DOWNLOAD_CDH_COMPONENTS is true. By default, it is false
which means the CDH components in the thirdparty directory
will be used instead.

To build the ASF tree, set $DOWNLOAD_CDH_COMPONENTS to true.
Currently, the CDH components in S3 are snapshots from the
thirdparty directory at 688d0efcd38731e8e27a8236dbdca21c8fd571a1.
Once the integration jenkins job is modified to periodically
update the S3 buckets, we can remove the thirdparty directory
and always use the CDH components in the toolchain directory.

Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
---
M CMakeLists.txt
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M buildall.sh
M testdata/cluster/node_templates/cdh5/etc/init.d/llama-application
5 files changed, 104 insertions(+), 62 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/33/3333/4
-- 
To view, visit http://gerrit.cloudera.org:8080/3333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3223: Supports download of CDH components from S3.

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change.

Change subject: IMPALA-3223: Supports download of CDH components from S3.
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/3333/2/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

Line 15: # Bootstrapping the native toolchain with prebuilt binaries
> This comment and the script name is getting increasingly inaccurate now tha
Comments updated. Keep the name of the script the same for now.


Line 288: def download_cdh_components(toolchain_root, cdh_components):
> I have SKIP_TOOLCHAIN_BOOTSTRAP set to true in my environment, so I guess i
Updated the script to not download the package at all if the package directory exists. Will not use the md5sum files anymore.


Line 342: 
> Extra line in comment.
Done


http://gerrit.cloudera.org:8080/#/c/3333/2/bin/impala-config.sh
File bin/impala-config.sh:

Line 49: : ${SKIP_TOOLCHAIN_BOOTSTRAP=false}
> Maybe a one-liner comment?
Done


http://gerrit.cloudera.org:8080/#/c/3333/2/buildall.sh
File buildall.sh:

Line 253:   if [ "$?" == "0" ]; then
> More concisely:
Done. Don't need the if-statement at all as you said below.


Line 255:   else
> Actually, set -e already fails the script if the bootstrap_toolchain.py com
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3223: Supports download of CDH components from S3.

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change.

Change subject: IMPALA-3223: Supports download of CDH components from S3.
......................................................................


Patch Set 7: Code-Review+2

Thank Sailesh. I ran some more private builds after picking up the fix. Carry +2 forward.

-- 
To view, visit http://gerrit.cloudera.org:8080/3333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3223: Supports download of CDH components from S3.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3223: Supports download of CDH components from S3.
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/3333/2/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

Line 15: # Bootstrapping the native toolchain with prebuilt binaries
This comment and the script name is getting increasingly inaccurate now that it's doing more things than just downloading prebuilt binaries. 

Let's update the comment.

Should we rename the script to something like download_dependencies.py? Ok to ignore.


Line 288: def download_cdh_components(toolchain_root, cdh_components):
bootstrap_toolchain can be run every time buildall.sh is run (for clean or incremental builds), so it seems like a problem to depend on the internet, e.g. if I want to develop offline or just don't want the additional build latency of the network round-trips. I run buildall.sh pretty frequently in my workflow, for example.

The native toolchain bootstrap logic only downloads missing versions of packages, so it only needs the internet for the initial bootstrap. 

People could toggle the value of DOWNLOAD_CDH_COMPONENTS to get it to do the desired thing, but we don't generally use environment variables that way in the build system.

So I think by default we should use the download-only-if-missing behaviour. The md5sum-checking behaviour seems useful in some cases, but I think it should be opt-in.


Line 342: 
Extra line in comment.


http://gerrit.cloudera.org:8080/#/c/3333/2/bin/impala-config.sh
File bin/impala-config.sh:

Line 49: : ${SKIP_TOOLCHAIN_BOOTSTRAP=false}
Maybe a one-liner comment?


http://gerrit.cloudera.org:8080/#/c/3333/2/buildall.sh
File buildall.sh:

Line 253:   if [ "$?" == "0" ]; then
More concisely:

if $IMPALA_HOME/bin/bootstrap_toolchain.py; then


Line 255:   else
Actually, set -e already fails the script if the bootstrap_toolchain.py command returns non-zero, so the else branch is dead code here.


-- 
To view, visit http://gerrit.cloudera.org:8080/3333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3223: Supports download of CDH components from S3.

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Jim Apple, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3333

to look at the new patch set (#7).

Change subject: IMPALA-3223: Supports download of CDH components from S3.
......................................................................

IMPALA-3223: Supports download of CDH components from S3.

This change updates the toolchain bootstrapping script
to download the CDH components (hadoop, hbase, hive, llama,
llama-minikdc and sentry) from the toolchain S3 bucket to
the toolchain directory if the environment variable
$DOWNLOAD_CDH_COMPONENTS is true. By default, it is false
which means the CDH components in the thirdparty directory
will be used instead.

To build the ASF tree(https://git-wip-us.apache.org/repos/asf?p=incubator-impala.git),
set $DOWNLOAD_CDH_COMPONENTS to true. Currently, the CDH
components in S3 are snapshots from the thirdparty directory
at 688d0efcd38731e8e27a8236dbdca21c8fd571a1. Once the integration
jenkins job (impala-cdh5-trunk-core-integration) is modified
to upload the latest stable builds to the S3 buckets, we can
remove the thirdparty directory and always use the CDH components
in the toolchain directory.

Note that bootstrap_toolchain.py will not overwrite existing
directories in the toolchain directory. To force a refresh of
cpmponents in the toolchain directory, a user should delete the
cached copy in the toolchain directory and execute
bootstrap_toolchain.py again. This behavior allows users to
develop locally without network connection once the toolchain
has been bootstrapped.

Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
---
M CMakeLists.txt
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M buildall.sh
M testdata/cluster/node_templates/cdh5/etc/init.d/llama-application
5 files changed, 107 insertions(+), 61 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/33/3333/7
-- 
To view, visit http://gerrit.cloudera.org:8080/3333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3223: Supports download of CDH components from S3.

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3333

to look at the new patch set (#5).

Change subject: IMPALA-3223: Supports download of CDH components from S3.
......................................................................

IMPALA-3223: Supports download of CDH components from S3.

This change updates the toolchain bootstrapping script
to download the CDH components (hadoop, hbase, hive, llama,
llama-minikdc and sentry) from the toolchain S3 bucket to
the toolchain directory if the environment variable
$DOWNLOAD_CDH_COMPONENTS is true. By default, it is false
which means the CDH components in the thirdparty directory
will be used instead.

To build the ASF tree(https://git-wip-us.apache.org/repos/asf?p=incubator-impala.git),
set $DOWNLOAD_CDH_COMPONENTS to true. Currently, the CDH
components in S3 are snapshots from the thirdparty directory
at 688d0efcd38731e8e27a8236dbdca21c8fd571a1. Once the integration
jenkins job (impala-cdh5-trunk-core-integration) is modified
to upload the latest stable builds to the S3 buckets, we can
remove the thirdparty directory and always use the CDH components
in the toolchain directory.

Note that bootstrap_toolchain.py will not overwrite existing
directories in the toolchain directory. To force a refresh of
cpmponents in the toolchain directory, a user should delete the
cached copy in the toolchain directory and execute
bootstrap_toolchain.py again. This behavior allows users to
develop locally without network connection once the toolchain
has been bootstrapped.

Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
---
M CMakeLists.txt
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M buildall.sh
M testdata/cluster/node_templates/cdh5/etc/init.d/llama-application
5 files changed, 107 insertions(+), 61 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/33/3333/5
-- 
To view, visit http://gerrit.cloudera.org:8080/3333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3223: Supports download of CDH components from S3.

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has uploaded a new patch set (#2).

Change subject: IMPALA-3223: Supports download of CDH components from S3.
......................................................................

IMPALA-3223: Supports download of CDH components from S3.

This change updates the toolchain bootstrapping script
to download the CDH components (hadoop, hbase, hive, llama,
llama-minikdc and sentry) from the toolchain S3 bucket to
the toolchain directory if the environment variable
$DOWNLOAD_CDH_COMPONENTS is true. By default, it is false
which means the CDH components in the thirdparty directory
will be used instead.

To build the ASF tree, set $DOWNLOAD_CDH_COMPONENTS to true.
Currently, the CDH components in S3 are snapshots from the
thirdparty directory at 688d0efcd38731e8e27a8236dbdca21c8fd571a1.
Once the integration jenkins job is modified to periodically
update the S3 buckets, we can remove the thirdparty directory
and always use the CDH components in the toolchain directory.

Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
---
M CMakeLists.txt
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M buildall.sh
M testdata/cluster/node_templates/cdh5/etc/init.d/llama-application
5 files changed, 121 insertions(+), 54 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/33/3333/2
-- 
To view, visit http://gerrit.cloudera.org:8080/3333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kw...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3223: Supports download of CDH components from S3.

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-3223: Supports download of CDH components from S3.
......................................................................


IMPALA-3223: Supports download of CDH components from S3.

This change updates the toolchain bootstrapping script
to download the CDH components (hadoop, hbase, hive, llama,
llama-minikdc and sentry) from the toolchain S3 bucket to
the toolchain directory if the environment variable
$DOWNLOAD_CDH_COMPONENTS is true. By default, it is false
which means the CDH components in the thirdparty directory
will be used instead.

To build the ASF tree(https://git-wip-us.apache.org/repos/asf?p=incubator-impala.git),
set $DOWNLOAD_CDH_COMPONENTS to true. Currently, the CDH
components in S3 are snapshots from the thirdparty directory
at 688d0efcd38731e8e27a8236dbdca21c8fd571a1. Once the integration
jenkins job (impala-cdh5-trunk-core-integration) is modified
to upload the latest stable builds to the S3 buckets, we can
remove the thirdparty directory and always use the CDH components
in the toolchain directory.

Note that bootstrap_toolchain.py will not overwrite existing
directories in the toolchain directory. To force a refresh of
cpmponents in the toolchain directory, a user should delete the
cached copy in the toolchain directory and execute
bootstrap_toolchain.py again. This behavior allows users to
develop locally without network connection once the toolchain
has been bootstrapped.

Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
Reviewed-on: http://gerrit.cloudera.org:8080/3333
Reviewed-by: Michael Ho <kw...@cloudera.com>
Tested-by: Internal Jenkins
---
M CMakeLists.txt
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M buildall.sh
M testdata/cluster/node_templates/cdh5/etc/init.d/llama-application
5 files changed, 107 insertions(+), 61 deletions(-)

Approvals:
  Michael Ho: Looks good to me, approved
  Internal Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/3333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3223: Supports download of CDH components from S3.

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change.

Change subject: IMPALA-3223: Supports download of CDH components from S3.
......................................................................


Patch Set 6: Code-Review+2

GVM will most likely fail because of  IMPALA-3754 whose fix has been checked in but hasn't been replicated to internal git repos due to IOPS-3710.

-- 
To view, visit http://gerrit.cloudera.org:8080/3333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3223: Supports download of CDH components from S3.

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-3223: Supports download of CDH components from S3.
......................................................................


Patch Set 6:

> GVM will most likely fail because of  IMPALA-3754 whose fix has
 > been checked in but hasn't been replicated to internal git repos
 > due to IOPS-3710.

Looks like all the pending patches just made it through. I don't know how or who fixed it but IMPALA-3754 is in now.

-- 
To view, visit http://gerrit.cloudera.org:8080/3333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3223: Supports download of CDH components from S3.

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change.

Change subject: IMPALA-3223: Supports download of CDH components from S3.
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/3333/4//COMMIT_MSG
Commit Message:

Line 17: To build the ASF tree, set $DOWNLOAD_CDH_COMPONENTS to true.
Can you point to the ASF repo: https://git-wip-us.apache.org/repos/asf?p=incubator-impala.git;a=summary


Line 20: Once the integration jenkins job is modified to periodically
Can you elaborate here - which job, specifically?

What does it do when it modifies the S3 buckets? How does it know they need updating?

In the long run, for users who do not want to use CDH components, should they download their own components and put them in thirdparty?


http://gerrit.cloudera.org:8080/#/c/3333/4/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

Line 91: def bootstrap(toolchain_root, packages):
Can you give this function a docstring?


Line 320:   $IMPALA_TOOLCHAIN.
This docstring looks outdated to me, because of your new changes.


-- 
To view, visit http://gerrit.cloudera.org:8080/3333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3223: Supports download of CDH components from S3.

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change.

Change subject: IMPALA-3223: Supports download of CDH components from S3.
......................................................................


Patch Set 5: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/3333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3223: Supports download of CDH components from S3.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3223: Supports download of CDH components from S3.
......................................................................


Patch Set 4: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/3333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3223: Supports download of CDH components from S3.

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change.

Change subject: IMPALA-3223: Supports download of CDH components from S3.
......................................................................


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/3333/4//COMMIT_MSG
Commit Message:

Line 17: To build the ASF tree, set $DOWNLOAD_CDH_COMPONENTS to true.
> Can you point to the ASF repo: https://git-wip-us.apache.org/repos/asf?p=in
Done


Line 20: Once the integration jenkins job is modified to periodically
> Can you elaborate here - which job, specifically?
Details of the jenkins job is added. The followings are replies to other questions. They are details of our internal process so I won't add that to the commit message.

The existing integration jenkins job will periodically download the latest CDH components and run Impala tests along with these new components. If all tests pass, they will be checked into the thirdparty directory and overwrite the previous version checked in. For the new procedure, the jenkins job will do the same testing as before. The only difference is that it will upload the latest stable version of CDH components to the S3 bucket. The details of which has to be worked out as it involves some atomic renaming.

If a user wants to run their own version of CDH components (e.g. ASF version of Hadoop), that's perfectly fine but they may need to update impala-config.sh with the appropriate version string of Hadoop he/she is using. There is no guarantee it will pass all the Impala tests with such configuration as we don't plan to test (and officially support) such configuration.


http://gerrit.cloudera.org:8080/#/c/3333/4/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

Line 91: def bootstrap(toolchain_root, packages):
> Can you give this function a docstring?
Done


Line 320:   $IMPALA_TOOLCHAIN.
> This docstring looks outdated to me, because of your new changes.
Done


http://gerrit.cloudera.org:8080/#/c/3333/4/buildall.sh
File buildall.sh:

Line 252:   # set -e was done above so any failure will terminate this script.
> Don't need to comment on this specifically - we have set -e as part of set 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3223: Supports download of CDH components from S3.

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-3223: Supports download of CDH components from S3.
......................................................................


Patch Set 7: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/3333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3223: Supports download of CDH components from S3.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3223: Supports download of CDH components from S3.
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3333/2/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

Line 288: def download_cdh_components(toolchain_root, cdh_components):
> I share your concern and I also tried to come up with a way to not download
I have SKIP_TOOLCHAIN_BOOTSTRAP set to true in my environment, so I guess it wouldn't necessarily affect me.

I think I prefer the first option too.

It may also be worth thinking about what we should do if the md5sum is unavailable (e.g. infra outage). It seems like it's not necessarily worth aborting the build if there's already some package present.


-- 
To view, visit http://gerrit.cloudera.org:8080/3333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3223: Supports download of CDH components from S3.

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has uploaded a new patch set (#3).

Change subject: IMPALA-3223: Supports download of CDH components from S3.
......................................................................

IMPALA-3223: Supports download of CDH components from S3.

This change updates the toolchain bootstrapping script
to download the CDH components (hadoop, hbase, hive, llama,
llama-minikdc and sentry) from the toolchain S3 bucket to
the toolchain directory if the environment variable
$DOWNLOAD_CDH_COMPONENTS is true. By default, it is false
which means the CDH components in the thirdparty directory
will be used instead.

To build the ASF tree, set $DOWNLOAD_CDH_COMPONENTS to true.
Currently, the CDH components in S3 are snapshots from the
thirdparty directory at 688d0efcd38731e8e27a8236dbdca21c8fd571a1.
Once the integration jenkins job is modified to periodically
update the S3 buckets, we can remove the thirdparty directory
and always use the CDH components in the toolchain directory.

Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
---
M CMakeLists.txt
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M buildall.sh
M testdata/cluster/node_templates/cdh5/etc/init.d/llama-application
5 files changed, 103 insertions(+), 62 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/33/3333/3
-- 
To view, visit http://gerrit.cloudera.org:8080/3333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3223: Supports download of CDH components from S3.

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change.

Change subject: IMPALA-3223: Supports download of CDH components from S3.
......................................................................


Patch Set 4:

Oh, and it would be good if the commit message included a description of the outcome of the discussion about how to developwhen offline.

In particular - what is the default behavior of the script, how can it be overloaded, and what happens if connectivity is absent.

-- 
To view, visit http://gerrit.cloudera.org:8080/3333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3223: Supports download of CDH components from S3.

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change.

Change subject: IMPALA-3223: Supports download of CDH components from S3.
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3333/2/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

Line 288: def download_cdh_components(toolchain_root, cdh_components):
> bootstrap_toolchain can be run every time buildall.sh is run (for clean or 
I share your concern and I also tried to come up with a way to not download the md5sum file for every incremental build.

The major difference between the CDH components and other pre-existing binaries in the toolchain directory is that we don't really have a good versioning system for the CDH components. Up to this point, the way it works is that the integration Jenkins job will check the latest approved version of the CDH components into our git repos. A git fetch will pick up the latest version. However, the CDH components will have the same version string so it's hard to tell if the version cached locally is stale or not. 

It's unclear to me if we really need to get the latest CDH components for our day-to-day development. May be it's already sufficient to have our Jenkins jobs do that as they always bootstrap the toolchain from scratch for each run. It would be great if others can chime in on this point.

If we can agree that it's unnecessary to always download the latest version of CDH components, this function can just skip downloading the component if it exists locally already. On the other hand, if we want to preserve the existing behavior, we may consider storing a versioning file in the CDH components directory and download only if we are behind.

Another way to work around the repeated downloading problem is to set SKIP_TOOLCHAIN_BOOTSTRAP to true. That may be particularly useful in disconnected environment.

With all the above said, I prefer the first option which is to download only if it's missing.


-- 
To view, visit http://gerrit.cloudera.org:8080/3333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I16fa79db0005554cc0a116e74775647ba99f8dda
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes