You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org> on 2019/03/25 19:39:56 UTC

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Vihang Karajgaonkar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12846


Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................

IMPALA-8345 : Add option to set up minicluster to use Hive 3

As a first step to integrate Impala with Hive 3.1.0 this patch modifies
the minicluster scripts to use Hive 3.1.0 instead of CDH Hive 2.1.1.

In order to make sure that existing setups don't break this option
is enabled via a command line argument to bin/impala-config.sh. This
command line argument (-use-hive3) sets up certain environment variables
such that Hive 3.1.0 based binaries can be used to instantiate Hive
service (Hiveserver2 and metastore). The default is still Hive 2.1.1

Also, since Hive 3.1.1 uses a upgraded metastore schema, this patch
makes use of a different database name so that it is easy to switch from
working from one environment which uses Hive 2.1.1 metastore to another
which usese Hive 3.1.0 metastore. In order to do so users should follow
the below steps:

1. Open a new terminal
2. Run bin/bootstrap_toolchain.py
2. source bin/impala-config.sh -use-hive3
3. source bin/create-test-configuration.sh -create-metastore
   The above step should provide "-create-metastore" only the first time
so that a new metastore db is created and the Hive 3.1.0 schema is
initialized. For all subsequent invocations, the "-create-metastore"
argument can be skipped. We should still source this script since the
hive-site.xml of Hive 3.1.0 is slightly different than Hive 2.1.0 and
needs to be regenerated.
4. Start services using the testdata/bin/run-all.sh

Note that the testing was performed locally by downloading the Hive 3.1
binaries into
toolchain/cdp_components-976603/apache-hive-3.1.0.6.0.99.0-9-bin. Once
the binaries are available in S3 bucket, the bootstrap_toolchain script
should automatically do this for you.

Testing Done:
1. Made sure that the cluster comes up with Hive 3.1 when the steps
above are performed.
2. Made sure that existing scripts work as they do currently when
argument is not provided.
3. Impala cluster comes and connects to HMS 3.1.0 (Note that Impala
still uses Hive 2.1.1 client. Upgrading client libraries in Impala will
be done as a separate change)

Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
---
M bin/bootstrap_toolchain.py
M bin/create-test-configuration.sh
M bin/impala-config.sh
A fe/src/test/resources/postgresql-hive-site.xml.cdp.template
M testdata/bin/run-hive-server.sh
5 files changed, 374 insertions(+), 9 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/12846/3
-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12846 )

Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................


Patch Set 11: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 11
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Wed, 27 Mar 2019 21:06:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12846 )

Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3962/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 11
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Wed, 27 Mar 2019 21:06:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/12846 )

Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................

IMPALA-8345 : Add option to set up minicluster to use Hive 3

As a first step to integrate Impala with Hive 3.1.0 this patch modifies
the minicluster scripts to use Hive 3.1.0 instead of CDH Hive 2.1.1.

In order to make sure that existing setups don't break this is
enabled via a environment variable override to bin/impala-config.sh.
When the environment variable USE_CDP_HIVE is set to true the
bootstrap_toolchain script downloads Hive 3.1.0 tarballs and extracts it
in the toolchain directory. These binaries are used to start the Hive
services (Hiveserver2 and metastore). The default is still CDH Hive 2.1.1

Also, since Hive 3.1.0 uses a upgraded metastore schema, this patch
makes use of a different database name so that it is easy to switch from
working from one environment which uses Hive 2.1.1 metastore to another
which usese Hive 3.1.0 metastore.

In order to start a minicluster which uses Hive 3.1.0 users should
follow the steps below:

1. Make sure that minicluster, if running, is stopped
before you run the following commands.
2. Open a new terminal and run following commands.
> export USE_CDP_HIVE=true
> source bin/impala-config.sh
> bin/bootstrap_toolchain.py
  The above command downloads the Hive 3.1.0 tarballs and extracts them
in toolchain/cdp_components-${CDP_BUILD_NUMBER} directory. This is a
no-op if the CDP_BUILD_NUMBER has not changed and if the cdp_components
are already downloaded by a previous invocation of the script.

> source bin/create-test-configuration.sh -create-metastore
   The above step should provide "-create-metastore" only the first time
so that a new metastore db is created and the Hive 3.1.0 schema is
initialized. For all subsequent invocations, the "-create-metastore"
argument can be skipped. We should still source this script since the
hive-site.xml of Hive 3.1.0 is different than Hive 2.1.0 and
needs to be regenerated.

> testdata/bin/run-all.sh

Note that the testing was performed locally by downloading the Hive 3.1
binaries into
toolchain/cdp_components-976603/apache-hive-3.1.0.6.0.99.0-9-bin. Once
the binaries are available in S3 bucket, the bootstrap_toolchain script
should automatically do this for you.

Testing Done:
1. Made sure that the cluster comes up with Hive 3.1 when the steps
above are performed.
2. Made sure that existing scripts work as they do currently when
argument is not provided.
3. Impala cluster comes and connects to HMS 3.1.0 (Note that Impala
still uses Hive 2.1.1 client. Upgrading client libraries in Impala will
be done as a separate change)

Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
---
M bin/bootstrap_toolchain.py
M bin/create-test-configuration.sh
M bin/impala-config.sh
A fe/src/test/resources/postgresql-hive-site.xml.cdp.template
M testdata/bin/run-hive-server.sh
5 files changed, 298 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/12846/7
-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 7
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/12846 )

Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................


Patch Set 10: Code-Review+2

(1 comment)

I saw few +1s from Tim and Andrew. I'm going to promote it to +2. This is a good start. We can always iterate it again to improve it further.

http://gerrit.cloudera.org:8080/#/c/12846/9/bin/create-test-configuration.sh
File bin/create-test-configuration.sh:

http://gerrit.cloudera.org:8080/#/c/12846/9/bin/create-test-configuration.sh@149
PS9, Line 149: 1>${IMPALA_CLUSTER_LOGS_DIR}/schematool.log 2>&1
> schematool has a problem which prints bunch of new lines on the stdout afte
Sounds good.



-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 10
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Wed, 27 Mar 2019 21:05:15 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12846 )

Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12846/6/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/12846/6/bin/bootstrap_toolchain.py@432
PS6, Line 432: def download_cdp_hive(toolchain_root):
flake8: E302 expected 2 blank lines, found 1



-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 6
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Tue, 26 Mar 2019 18:44:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12846 )

Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2549/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 7
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Tue, 26 Mar 2019 19:28:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/12846 )

Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................

IMPALA-8345 : Add option to set up minicluster to use Hive 3

As a first step to integrate Impala with Hive 3.1.0 this patch modifies
the minicluster scripts to use Hive 3.1.0 instead of CDH Hive 2.1.1.

In order to make sure that existing setups don't break this option
is enabled via a command line argument to bin/impala-config.sh. This
command line argument (-use-hive3) sets up certain environment variables
such that Hive 3.1.0 based binaries can be used to instantiate Hive
service (Hiveserver2 and metastore). The default is still Hive 2.1.1

Also, since Hive 3.1.1 uses a upgraded metastore schema, this patch
makes use of a different database name so that it is easy to switch from
working from one environment which uses Hive 2.1.1 metastore to another
which usese Hive 3.1.0 metastore. In order to do so users should follow
the below steps:

1. Open a new terminal
2. Run bin/bootstrap_toolchain.py
2. source bin/impala-config.sh -use-hive3
3. source bin/create-test-configuration.sh -create-metastore
   The above step should provide "-create-metastore" only the first time
so that a new metastore db is created and the Hive 3.1.0 schema is
initialized. For all subsequent invocations, the "-create-metastore"
argument can be skipped. We should still source this script since the
hive-site.xml of Hive 3.1.0 is slightly different than Hive 2.1.0 and
needs to be regenerated.
4. Start services using the testdata/bin/run-all.sh

Note that the testing was performed locally by downloading the Hive 3.1
binaries into
toolchain/cdp_components-976603/apache-hive-3.1.0.6.0.99.0-9-bin. Once
the binaries are available in S3 bucket, the bootstrap_toolchain script
should automatically do this for you.

Testing Done:
1. Made sure that the cluster comes up with Hive 3.1 when the steps
above are performed.
2. Made sure that existing scripts work as they do currently when
argument is not provided.
3. Impala cluster comes and connects to HMS 3.1.0 (Note that Impala
still uses Hive 2.1.1 client. Upgrading client libraries in Impala will
be done as a separate change)

Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
---
M bin/bootstrap_toolchain.py
M bin/create-test-configuration.sh
M bin/impala-config.sh
A fe/src/test/resources/postgresql-hive-site.xml.cdp.template
M testdata/bin/run-hive-server.sh
5 files changed, 372 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/12846/4
-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/12846 )

Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................

IMPALA-8345 : Add option to set up minicluster to use Hive 3

As a first step to integrate Impala with Hive 3.1.0 this patch modifies
the minicluster scripts to use Hive 3.1.0 instead of CDH Hive 2.1.1.

In order to make sure that existing setups don't break this is
enabled via a environment variable override to bin/impala-config.sh.
When the environment variable USE_CDP_HIVE is set to true the
bootstrap_toolchain script downloads Hive 3.1.0 tarballs and extracts it
in the toolchain directory. These binaries are used to start the Hive
services (Hiveserver2 and metastore). The default is still CDH Hive 2.1.1

Also, since Hive 3.1.0 uses a upgraded metastore schema, this patch
makes use of a different database name so that it is easy to switch from
working from one environment which uses Hive 2.1.1 metastore to another
which usese Hive 3.1.0 metastore.

In order to start a minicluster which uses Hive 3.1.0 users should
follow the steps below:

1. Make sure that minicluster, if running, is stopped
before you run the following commands.
2. Open a new terminal and run following commands.
> export USE_CDP_HIVE=true
> source bin/impala-config.sh
> bin/bootstrap_toolchain.py
  The above command downloads the Hive 3.1.0 tarballs and extracts them
in toolchain/cdp_components-${CDP_BUILD_NUMBER} directory. This is a
no-op if the CDP_BUILD_NUMBER has not changed and if the cdp_components
are already downloaded by a previous invocation of the script.

> source bin/create-test-configuration.sh -create-metastore
   The above step should provide "-create-metastore" only the first time
so that a new metastore db is created and the Hive 3.1.0 schema is
initialized. For all subsequent invocations, the "-create-metastore"
argument can be skipped. We should still source this script since the
hive-site.xml of Hive 3.1.0 is different than Hive 2.1.0 and
needs to be regenerated.

> testdata/bin/run-all.sh

Note that the testing was performed locally by downloading the Hive 3.1
binaries into
toolchain/cdp_components-976603/apache-hive-3.1.0.6.0.99.0-9-bin. Once
the binaries are available in S3 bucket, the bootstrap_toolchain script
should automatically do this for you.

Testing Done:
1. Made sure that the cluster comes up with Hive 3.1 when the steps
above are performed.
2. Made sure that existing scripts work as they do currently when
argument is not provided.
3. Impala cluster comes and connects to HMS 3.1.0 (Note that Impala
still uses Hive 2.1.1 client. Upgrading client libraries in Impala will
be done as a separate change)

Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
---
M bin/bootstrap_toolchain.py
M bin/create-test-configuration.sh
M bin/impala-config.sh
A fe/src/test/resources/postgresql-hive-site.xml.cdp.template
M testdata/bin/run-hive-server.sh
5 files changed, 297 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/12846/6
-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 6
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/12846 )

Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................


Patch Set 4:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/12846/4/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/12846/4/bin/bootstrap_toolchain.py@434
PS4, Line 434: os.getenv("USE_CDP
do we have any utility code anywhere that's more permission than this? I can see someone setting it to '1' and being very confused why it's not working.


http://gerrit.cloudera.org:8080/#/c/12846/4/bin/bootstrap_toolchain.py@450
PS4, Line 450: present
maybe say 'set' here since it doesn't actually need to be present? (it will be makedirred below)


http://gerrit.cloudera.org:8080/#/c/12846/4/bin/bootstrap_toolchain.py@466
PS4, Line 466:   # TODO the tar file name in the cdp build don't match with the version number. Hard
             :   #  coding the name here currently
             :   file_name = "{0}.tar.gz".format(dir_name)
is this TODO inaccurate? it looks like from the code here it does match.


http://gerrit.cloudera.org:8080/#/c/12846/4/bin/create-test-configuration.sh
File bin/create-test-configuration.sh:

http://gerrit.cloudera.org:8080/#/c/12846/4/bin/create-test-configuration.sh@146
PS4, Line 146:  # Hive schema SQL scripts include other scripts using \i, which expects absolute paths.
             :   # Switch to the scripts directory to make this work.
             :   pushd ${HIVE_HOME}/bin
this pushd/popd is no longer relevant now that you're using schematool, right?


http://gerrit.cloudera.org:8080/#/c/12846/4/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/12846/4/bin/impala-config.sh@38
PS4, Line 38: for ARG in $*
nit: usually 'do' is on the same line


http://gerrit.cloudera.org:8080/#/c/12846/4/bin/impala-config.sh@41
PS4, Line 41:     -use-hive3)
I think '--' instead of '-' is more common for long arg names


http://gerrit.cloudera.org:8080/#/c/12846/4/bin/impala-config.sh@44
PS4, Line 44:     -help)
same


http://gerrit.cloudera.org:8080/#/c/12846/4/bin/impala-config.sh@49
PS4, Line 49:   esac
do you want a default case here that prints usage info? otherwise a typo in the args would just be silently ignored.


http://gerrit.cloudera.org:8080/#/c/12846/4/bin/impala-config.sh@310
PS4, Line 310:   export METASTORE_DB=${METASTORE_DB-"$(cut -c-63 <<< HMS$ESCAPED_IMPALA_HOME)_cdp"}
I'm assuming the 63-character 'cut' here is because of a 63-character limit in db names in postgres or something. Given that, I guess we need to cut to 59 instead of 63?


http://gerrit.cloudera.org:8080/#/c/12846/4/bin/impala-config.sh@767
PS4, Line 767:   echo "IMPALA_HIVE_VERSION     = $IMPALA_HIVE_VERSION"
nit: indentation off


http://gerrit.cloudera.org:8080/#/c/12846/4/fe/src/test/resources/postgresql-hive-site.xml.cdp.template
File fe/src/test/resources/postgresql-hive-site.xml.cdp.template:

http://gerrit.cloudera.org:8080/#/c/12846/4/fe/src/test/resources/postgresql-hive-site.xml.cdp.template@99
PS4, Line 99: <!--property>
noticed a bunch of things commented out in this file. Should we remove the things that aren't actually used?


http://gerrit.cloudera.org:8080/#/c/12846/4/fe/src/test/resources/postgresql-hive-site.xml.cdp.template@117
PS4, Line 117:   <name>hive.metastore.rawstore.impl</name>
this is the default, right?


http://gerrit.cloudera.org:8080/#/c/12846/4/fe/src/test/resources/postgresql-hive-site.xml.cdp.template@123
PS4, Line 123:   <name>dfs.replication</name>
shouldn't be necessary since this is picked up from hdfs-site, right?


http://gerrit.cloudera.org:8080/#/c/12846/4/fe/src/test/resources/postgresql-hive-site.xml.cdp.template@145
PS4, Line 145: <!-- Having problems getting the metastore up with Kerberos.  Defer for now.
It seems this is copy-pasted from the old hive-site.xml.template and we've been deferring it since 2014. At this point maybe we should just file a JIRA about the known gap in testing, and remove this commented-out stuff?


http://gerrit.cloudera.org:8080/#/c/12846/4/testdata/bin/run-hive-server.sh
File testdata/bin/run-hive-server.sh:

http://gerrit.cloudera.org:8080/#/c/12846/4/testdata/bin/run-hive-server.sh@73
PS4, Line 73:     if [[ -n "$SENTRY_HOME" ]]
can combine with the previous line with an &&


http://gerrit.cloudera.org:8080/#/c/12846/4/testdata/bin/run-hive-server.sh@74
PS4, Line 74:   then
usually this goes on the same line as 'if'


http://gerrit.cloudera.org:8080/#/c/12846/4/testdata/bin/run-hive-server.sh@75
PS4, Line 75:     for f in ${SENTRY_HOME}/lib/*.jar ${SENTRY_HOME}/lib/plugins/*.jar; do
this seems likely to be a bit wider than necessary. Are we spewing stuff onto the Hive classpath that might cause problems?

Not a huge deal since this is relatively temporary, but I wonder if there is any more restrictive to get the transitive classpath dependencies of just the sentry plugin rather than all of sentry.



-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Mon, 25 Mar 2019 21:31:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12846 )

Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................


Patch Set 11: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 11
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Thu, 28 Mar 2019 01:52:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12846 )

Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................


Patch Set 3:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/2535/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 25 Mar 2019 19:57:52 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12846 )

Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/12846/3/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/12846/3/bin/bootstrap_toolchain.py@465
PS3, Line 465: p
flake8: F841 local variable 'platform_label' is assigned to but never used


http://gerrit.cloudera.org:8080/#/c/12846/3/bin/create-test-configuration.sh
File bin/create-test-configuration.sh:

http://gerrit.cloudera.org:8080/#/c/12846/3/bin/create-test-configuration.sh@132
PS3, Line 132:   # Certain configurations (like SentrySyncHMSNotificationsPostListener) does not work with HMS 3.1.0
line too long (101 > 90)


http://gerrit.cloudera.org:8080/#/c/12846/3/testdata/bin/run-hive-server.sh
File testdata/bin/run-hive-server.sh:

http://gerrit.cloudera.org:8080/#/c/12846/3/testdata/bin/run-hive-server.sh@66
PS3, Line 66: export HIVE_METASTORE_HADOOP_OPTS="-verbose:class -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=30010"
line too long (121 > 90)


http://gerrit.cloudera.org:8080/#/c/12846/3/testdata/bin/run-hive-server.sh@69
PS3, Line 69: # CDH Hive metastore scripts do not do so. This is currently to make sure that we can run all the tests
line too long (103 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 25 Mar 2019 19:41:06 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12846 )

Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2550/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 9
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Tue, 26 Mar 2019 20:55:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12846 )

Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................

IMPALA-8345 : Add option to set up minicluster to use Hive 3

As a first step to integrate Impala with Hive 3.1.0 this patch modifies
the minicluster scripts to optionally use Hive 3.1.0 instead of
CDH Hive 2.1.1.

In order to make sure that existing setups don't break this is
enabled via a environment variable override to bin/impala-config.sh.
When the environment variable USE_CDP_HIVE is set to true the
bootstrap_toolchain script downloads Hive 3.1.0 tarballs and extracts it
in the toolchain directory. These binaries are used to start the Hive
services (Hiveserver2 and metastore). The default is still CDH Hive 2.1.1

Also, since Hive 3.1.0 uses a upgraded metastore schema, this patch
makes use of a different database name so that it is easy to switch from
working from one environment which uses Hive 2.1.1 metastore to another
which usese Hive 3.1.0 metastore.

In order to start a minicluster which uses Hive 3.1.0 users should
follow the steps below:

1. Make sure that minicluster, if running, is stopped
before you run the following commands.
2. Open a new terminal and run following commands.
> export USE_CDP_HIVE=true
> source bin/impala-config.sh
> bin/bootstrap_toolchain.py
  The above command downloads the Hive 3.1.0 tarballs and extracts them
in toolchain/cdp_components-${CDP_BUILD_NUMBER} directory. This is a
no-op if the CDP_BUILD_NUMBER has not changed and if the cdp_components
are already downloaded by a previous invocation of the script.

> source bin/create-test-configuration.sh -create-metastore
   The above step should provide "-create-metastore" only the first time
so that a new metastore db is created and the Hive 3.1.0 schema is
initialized. For all subsequent invocations, the "-create-metastore"
argument can be skipped. We should still source this script since the
hive-site.xml of Hive 3.1.0 is different than Hive 2.1.0 and
needs to be regenerated.

> testdata/bin/run-all.sh

Note that the testing was performed locally by downloading the Hive 3.1
binaries into
toolchain/cdp_components-976603/apache-hive-3.1.0.6.0.99.0-9-bin. Once
the binaries are available in S3 bucket, the bootstrap_toolchain script
should automatically do this for you.

Testing Done:
1. Made sure that the cluster comes up with Hive 3.1 when the steps
above are performed.
2. Made sure that existing scripts work as they do currently when
argument is not provided.
3. Impala cluster comes and connects to HMS 3.1.0 (Note that Impala
still uses Hive 2.1.1 client. Upgrading client libraries in Impala will
be done as a separate change)

Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Reviewed-on: http://gerrit.cloudera.org:8080/12846
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M bin/bootstrap_toolchain.py
M bin/create-test-configuration.sh
M bin/impala-config.sh
A fe/src/test/resources/postgresql-hive-site.xml.cdp.template
M testdata/bin/run-hive-server.sh
5 files changed, 284 insertions(+), 11 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 12
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12846 )

Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12846/4/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/12846/4/bin/impala-config.sh@37
PS4, Line 37: # parse command line options
> It seems like options to impala-config.sh are currently passed by environme
Yeah, I think it would be best to avoid making this a special option that behaves differently to everything else. A lot of scripts source impala-config.sh without arguments. This works today for the two valid ways to set options - via environment variables or by setting them in impala-config-local.sh/impala-config-branch.sh



-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Mon, 25 Mar 2019 23:34:55 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12846 )

Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2536/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Mon, 25 Mar 2019 20:59:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12846 )

Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12846/4/testdata/bin/run-hive-server.sh
File testdata/bin/run-hive-server.sh:

http://gerrit.cloudera.org:8080/#/c/12846/4/testdata/bin/run-hive-server.sh@69
PS4, Line 69: # CDH Hive metastore scripts do not do so. This is currently to make sure that we can run all
line too long (93 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Mon, 25 Mar 2019 20:15:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Posted by "Andrew Sherman (Code Review)" <ge...@cloudera.org>.
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/12846 )

Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................


Patch Set 7: Code-Review+1

(5 comments)

a few nits but looks good

http://gerrit.cloudera.org:8080/#/c/12846/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12846/6//COMMIT_MSG@10
PS6, Line 10: the minicluster scripts to use Hive 3.1.0 instead of CDH Hive 2.1.1.
Nit: to optionally use


http://gerrit.cloudera.org:8080/#/c/12846/6/bin/create-test-configuration.sh
File bin/create-test-configuration.sh:

http://gerrit.cloudera.org:8080/#/c/12846/6/bin/create-test-configuration.sh@132
PS6, Line 132:   # Certain configurations (like SentrySyncHMSNotificationsPostListener) does not work
s/does not/do not/


http://gerrit.cloudera.org:8080/#/c/12846/6/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/12846/6/bin/impala-config.sh@163
PS6, Line 163: export CDH_BUILD_NUMBER=909265
Nit: I wonder if CDP_BUILD_NUMBER should go here so that the 2 build numbers are together


http://gerrit.cloudera.org:8080/#/c/12846/6/bin/impala-config.sh@534
PS6, Line 534:   export HIVE_HOME="$CDP_COMPONENTS_HOME/apache-hive-${IMPALA_HIVE_VERSION}-bin"
Why in one case is the home apache-hive-xxx and in the other it is hive-xxx ?


http://gerrit.cloudera.org:8080/#/c/12846/6/bin/impala-config.sh@748
PS6, Line 748:   echo "CDP_BUILD_NUMBER        = $CDP_BUILD_NUMBER"
It could be confusing that you always echo CDH_BUILD_NUMBER but you only echo CDP_BUILD_NUMBER when doing a CDP build.



-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 7
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Tue, 26 Mar 2019 19:12:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/12846 )

Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................


Patch Set 9:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/12846/9/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/12846/9/bin/bootstrap_toolchain.py@433
PS9, Line 433: def download_cdp_hive(toolchain_root):
We don't have to do it now, but at some point, we should refactor this function to be more generic, like downloading Ranger, Hive, etc.


http://gerrit.cloudera.org:8080/#/c/12846/9/bin/create-test-configuration.sh
File bin/create-test-configuration.sh:

http://gerrit.cloudera.org:8080/#/c/12846/9/bin/create-test-configuration.sh@132
PS9, Line 132: # Certain configurations (like SentrySyncHMSNotificationsPostListener) do not work
             :   # with HMS 3.1.0. Use a cdp specific configuration template
             :   generate_config postgresql-hive-site.xml.cdp.template hive-site.xml
will this cause Sentry tests to fail when USE_CDP_HIVE=true?


http://gerrit.cloudera.org:8080/#/c/12846/9/bin/create-test-configuration.sh@149
PS9, Line 149: 1>${IMPALA_CLUSTER_LOGS_DIR}/schematool.log 2>&1
it may be better to use tee

2>&1 |  tee ${IMPALA_CLUSTER_LOGS_DIR}/schematool.log


http://gerrit.cloudera.org:8080/#/c/12846/9/fe/src/test/resources/postgresql-hive-site.xml.cdp.template
File fe/src/test/resources/postgresql-hive-site.xml.cdp.template:

http://gerrit.cloudera.org:8080/#/c/12846/9/fe/src/test/resources/postgresql-hive-site.xml.cdp.template@27
PS9, Line 27: <property>
nit: formatting is off in this file, a lot of mixed 1 space vs 2 spaces.


http://gerrit.cloudera.org:8080/#/c/12846/9/testdata/bin/run-hive-server.sh
File testdata/bin/run-hive-server.sh:

http://gerrit.cloudera.org:8080/#/c/12846/9/testdata/bin/run-hive-server.sh@72
PS9, Line 72: if [[ $USE_CDP_HIVE && -n "$SENTRY_HOME" ]]; then
            :     for f in ${SENTRY_HOME}/lib/sentry-binding-hive*.jar; do
            :         FILE_NAME=$(basename $f)
            :         # exclude all the hive jars from being included in the classpath since Sentry
            :         # depends on Hive 2.1.1
            :         if [[ ! $FILE_NAME == hive* ]]; then
            :          export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:${f}
            :         fi
            :     done
            : fi
nit: use 2 spaces



-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 9
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Wed, 27 Mar 2019 01:50:55 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Posted by "Andrew Sherman (Code Review)" <ge...@cloudera.org>.
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/12846 )

Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12846/4/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/12846/4/bin/impala-config.sh@37
PS4, Line 37: # parse command line options
It seems like options to impala-config.sh are currently passed by environment variable, for example USE_KUDU_DEBUG_BUILD can be set before sourcing impala-config.sh. Is there a particular reason you chose to add arguments to impala-config.sh rather than using the existing mechanism?



-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Mon, 25 Mar 2019 22:52:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/12846 )

Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12846/6/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/12846/6/bin/bootstrap_toolchain.py@432
PS6, Line 432: 
> flake8: E302 expected 2 blank lines, found 1
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 7
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Tue, 26 Mar 2019 18:56:28 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12846 )

Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2548/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 6
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Tue, 26 Mar 2019 19:27:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12846 )

Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................


Patch Set 9:

Thanks for addressing my feedback. I'll set someone else review to a +2


-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 9
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Wed, 27 Mar 2019 01:18:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/12846 )

Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................

IMPALA-8345 : Add option to set up minicluster to use Hive 3

As a first step to integrate Impala with Hive 3.1.0 this patch modifies
the minicluster scripts to optionally use Hive 3.1.0 instead of
CDH Hive 2.1.1.

In order to make sure that existing setups don't break this is
enabled via a environment variable override to bin/impala-config.sh.
When the environment variable USE_CDP_HIVE is set to true the
bootstrap_toolchain script downloads Hive 3.1.0 tarballs and extracts it
in the toolchain directory. These binaries are used to start the Hive
services (Hiveserver2 and metastore). The default is still CDH Hive 2.1.1

Also, since Hive 3.1.0 uses a upgraded metastore schema, this patch
makes use of a different database name so that it is easy to switch from
working from one environment which uses Hive 2.1.1 metastore to another
which usese Hive 3.1.0 metastore.

In order to start a minicluster which uses Hive 3.1.0 users should
follow the steps below:

1. Make sure that minicluster, if running, is stopped
before you run the following commands.
2. Open a new terminal and run following commands.
> export USE_CDP_HIVE=true
> source bin/impala-config.sh
> bin/bootstrap_toolchain.py
  The above command downloads the Hive 3.1.0 tarballs and extracts them
in toolchain/cdp_components-${CDP_BUILD_NUMBER} directory. This is a
no-op if the CDP_BUILD_NUMBER has not changed and if the cdp_components
are already downloaded by a previous invocation of the script.

> source bin/create-test-configuration.sh -create-metastore
   The above step should provide "-create-metastore" only the first time
so that a new metastore db is created and the Hive 3.1.0 schema is
initialized. For all subsequent invocations, the "-create-metastore"
argument can be skipped. We should still source this script since the
hive-site.xml of Hive 3.1.0 is different than Hive 2.1.0 and
needs to be regenerated.

> testdata/bin/run-all.sh

Note that the testing was performed locally by downloading the Hive 3.1
binaries into
toolchain/cdp_components-976603/apache-hive-3.1.0.6.0.99.0-9-bin. Once
the binaries are available in S3 bucket, the bootstrap_toolchain script
should automatically do this for you.

Testing Done:
1. Made sure that the cluster comes up with Hive 3.1 when the steps
above are performed.
2. Made sure that existing scripts work as they do currently when
argument is not provided.
3. Impala cluster comes and connects to HMS 3.1.0 (Note that Impala
still uses Hive 2.1.1 client. Upgrading client libraries in Impala will
be done as a separate change)

Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
---
M bin/bootstrap_toolchain.py
M bin/create-test-configuration.sh
M bin/impala-config.sh
A fe/src/test/resources/postgresql-hive-site.xml.cdp.template
M testdata/bin/run-hive-server.sh
5 files changed, 303 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/12846/9
-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 9
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/12846 )

Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................

IMPALA-8345 : Add option to set up minicluster to use Hive 3

As a first step to integrate Impala with Hive 3.1.0 this patch modifies
the minicluster scripts to optionally use Hive 3.1.0 instead of
CDH Hive 2.1.1.

In order to make sure that existing setups don't break this is
enabled via a environment variable override to bin/impala-config.sh.
When the environment variable USE_CDP_HIVE is set to true the
bootstrap_toolchain script downloads Hive 3.1.0 tarballs and extracts it
in the toolchain directory. These binaries are used to start the Hive
services (Hiveserver2 and metastore). The default is still CDH Hive 2.1.1

Also, since Hive 3.1.0 uses a upgraded metastore schema, this patch
makes use of a different database name so that it is easy to switch from
working from one environment which uses Hive 2.1.1 metastore to another
which usese Hive 3.1.0 metastore.

In order to start a minicluster which uses Hive 3.1.0 users should
follow the steps below:

1. Make sure that minicluster, if running, is stopped
before you run the following commands.
2. Open a new terminal and run following commands.
> export USE_CDP_HIVE=true
> source bin/impala-config.sh
> bin/bootstrap_toolchain.py
  The above command downloads the Hive 3.1.0 tarballs and extracts them
in toolchain/cdp_components-${CDP_BUILD_NUMBER} directory. This is a
no-op if the CDP_BUILD_NUMBER has not changed and if the cdp_components
are already downloaded by a previous invocation of the script.

> source bin/create-test-configuration.sh -create-metastore
   The above step should provide "-create-metastore" only the first time
so that a new metastore db is created and the Hive 3.1.0 schema is
initialized. For all subsequent invocations, the "-create-metastore"
argument can be skipped. We should still source this script since the
hive-site.xml of Hive 3.1.0 is different than Hive 2.1.0 and
needs to be regenerated.

> testdata/bin/run-all.sh

Note that the testing was performed locally by downloading the Hive 3.1
binaries into
toolchain/cdp_components-976603/apache-hive-3.1.0.6.0.99.0-9-bin. Once
the binaries are available in S3 bucket, the bootstrap_toolchain script
should automatically do this for you.

Testing Done:
1. Made sure that the cluster comes up with Hive 3.1 when the steps
above are performed.
2. Made sure that existing scripts work as they do currently when
argument is not provided.
3. Impala cluster comes and connects to HMS 3.1.0 (Note that Impala
still uses Hive 2.1.1 client. Upgrading client libraries in Impala will
be done as a separate change)

Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
---
M bin/bootstrap_toolchain.py
M bin/create-test-configuration.sh
M bin/impala-config.sh
A fe/src/test/resources/postgresql-hive-site.xml.cdp.template
M testdata/bin/run-hive-server.sh
5 files changed, 284 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/12846/10
-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 10
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>

[Impala-ASF-CR] IMPALA-8345 : Add option to set up minicluster to use Hive 3

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12846 )

Change subject: IMPALA-8345 : Add option to set up minicluster to use Hive 3
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2566/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icfed856c1f5429ed45fd3d9cb08a5d1bb96a9605
Gerrit-Change-Number: 12846
Gerrit-PatchSet: 10
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <as...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Wed, 27 Mar 2019 21:45:13 +0000
Gerrit-HasComments: No