You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Laszlo Gaal (Code Review)" <ge...@cloudera.org> on 2023/12/01 17:58:07 UTC

[Impala-ASF-CR] IMPALA-11157: Use native-toolchain hadoop for aarch64

Laszlo Gaal has posted comments on this change. ( http://gerrit.cloudera.org:8080/20737 )

Change subject: IMPALA-11157: Use native-toolchain hadoop for aarch64
......................................................................


Patch Set 2:

(2 comments)

I've tried running a core-mode test build with this change, but dataload seems to be quite flaky:
- https://jenkins.impala.io/view/Ubuntu%2020/job/ubuntu-20.04-from-scratch-ARM/62/ crashed with symptoms indicating an Impala crash (Transport endpoint is not connected):

00:52:56.956 ERROR: INSERT OVERWRITE TABLE functional_parquet.alltypessmall partition (year, month)
00:52:56.956 SELECT id, bool_col, tinyint_col, smallint_col, int_col, bigint_col, float_col, double_col, date_string_col, string_col, timestamp_col, year, month
00:52:56.956 FROM functional.alltypessmall
00:52:56.956 Traceback (most recent call last):
00:52:56.956   File "/home/ubuntu/Impala/bin/load-data.py", line 189, in exec_impala_query_from_file
00:52:56.956     result = impala_client.execute(query)
00:52:56.956   File "/home/ubuntu/Impala/tests/beeswax/impala_beeswax.py", line 191, in execute
00:52:56.956     handle = self.__execute_query(query_string.strip(), user=user)
00:52:56.956   File "/home/ubuntu/Impala/tests/beeswax/impala_beeswax.py", line 369, in __execute_query
00:52:56.956     self.wait_for_finished(handle)
00:52:56.956   File "/home/ubuntu/Impala/tests/beeswax/impala_beeswax.py", line 390, in wait_for_finished
00:52:56.956     raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
00:52:56.956 ImpalaBeeswaxException: ImpalaBeeswaxException:
00:52:56.956  Query aborted:Exec() rpc failed: Network error: recv error from unknown peer: Transport endpoint is not connected (error 107)

Another build on a private instance (similar size with Ubuntu 20.04) crashed in a different location during dataload, leaving a number of JVM crash reports in /var/crash, and several hs_err_pidNNN.logs in the Impala working directory.
(These are pretty large; I'm happy to send them to you directly if you think they would be helpful).

My primary suspicion is the version mismatch between CDP_HADOOP_VERSION and IMPALA_HADOOP_CLIENT_BINARY_VERSION, see the comments.

http://gerrit.cloudera.org:8080/#/c/20737/2/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/20737/2/bin/impala-config.sh@207
PS2, Line 207: 3.3.6
I'm afraid this version is too new, and much too different from the regular Hadoop version, see below


http://gerrit.cloudera.org:8080/#/c/20737/2/bin/impala-config.sh@253
PS2, Line 253: 3.1.1.7.2.18.0-369
I'm afraid this version of Hadoop is not really compatible with the significantly newer 3.3.6 that's specified for the ARM native libraries above.

3.1.1 was released 5 years ago; 3.3.6 is quite fresh, and it has recently received a full upgrade to the AWS Java SDK v2, among other changes.



-- 
To view, visit http://gerrit.cloudera.org:8080/20737
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide5ad327d6ce7c2a6b7d0ec4cf1dd53fef987720
Gerrit-Change-Number: 20737
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Comment-Date: Fri, 01 Dec 2023 17:58:07 +0000
Gerrit-HasComments: Yes